×

zbMATH — the first resource for mathematics

Kernel mixture model for probability density estimation in Bayesian classifiers. (English) Zbl 1416.62369
Summary: Estimating reliable class-conditional probability is the prerequisite to implement Bayesian classifiers, and how to estimate the probability density functions (PDFs) is also a fundamental problem for other probabilistic induction algorithms. The finite mixture model (FMM) is able to represent arbitrary complex PDFs by using a mixture of mutimodal distributions, but it assumes that the component mixtures follows a given distribution, which may not be satisfied for real world data. This paper presents a non-parametric kernel mixture model (KMM) based probability density estimation approach, in which the data sample of a class is assumed to be drawn by several unknown independent hidden subclasses. Unlike traditional FMM schemes, we simply use the \(k\)-means clustering algorithm to partition the data sample into several independent components, and the regional density diversities of components are combined using the Bayes theorem. On the basis of the proposed kernel mixture model, we present a three-step Bayesian classifier, which includes partitioning, structure learning, and PDF estimation. Experimental results show that KMM is able to improve the quality of estimated PDFs of conventional kernel density estimation (KDE) method, and also show that KMM-based Bayesian classifiers outperforms existing Gaussian, GMM, and KDE-based Bayesian classifiers.
MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation
Software:
BNT; HandTill2001; pyMEF
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Babich, GA; Camps, OI, Weighted parzen windows for pattern classification, IEEE Trans Pattern Anal Mach Intell, 18, 567-570, (1996)
[2] Bielza, C., Discrete bayesian network classifiers: a survey, ACM Comput Surv, 47, 1-43, (2014) · Zbl 1322.68147
[3] Bouckaert RR (2004) Naive bayes classifiers that perform well with continuous variables. In: AI 2004: advances in artificial intelligence, Springer, Berlin, pp 1089-1094
[4] Castillo E, Gutierrez JM, Hadi AS (2012) Expert systems and probabilistic network models. Springer, Berlin · Zbl 0867.68099
[5] Chickering, DM, Learning bayesian networks is np-complete, Lect. Notes Stat., 112, 121-130, (2010)
[6] Chow, CK; Liu, CN; Liu, c., Approximating discrete probability distributions with dependence trees, IEEE Trans. Inf. Theory, 14, 462-467, (1968) · Zbl 0165.22305
[7] Dehnad K (1986) Density estimation for statistics and data analysis. Chapman and Hall, Boca Raton
[8] Domingos, P.; Pazzani, M., On the optimality of the simple bayesian classifier under zero-one loss, Mach Learn, 29, 103-130, (1997) · Zbl 0892.68076
[9] Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York · Zbl 0968.68140
[10] Escobar, MD; West, M., Bayesian density estimation and inference using mixtures, J Am Stat Assoc, 90, 577-588, (1995) · Zbl 0826.62021
[11] Figueiredo, MAT; Jain, AK, Unsupervised learning of finite mixture models, IEEE Trans Pattern Anal Mach Intell, 24, 381-396, (2002)
[12] Friedman, N.; Dan, G.; Goldszmidt, M., Bayesian network classifiers, Mach Learn, 29, 131-163, (1997) · Zbl 0892.68077
[13] Girolami, M.; He, C., Probability density estimation from optimally condensed data samples, IEEE Trans Pattern Anal Mach Intell, 25, 1253-1264, (2003)
[14] Hand, DJ; Till, RJ, A simple generalisation of the area under the roc curve for multiple class classification problems, Mach Learn, 45, 171-186, (2001) · Zbl 1007.68180
[15] Hand, DJ; Yu, K., Idiot’s bayesłnot so stupid after all?, Int Stat Rev, 69, 385-398, (2001) · Zbl 1213.62010
[16] Heckerman, D.; Dan, G.; Chickering, DM, Learning bayesian networks: the combination of knowledge and statistical data, Mach Learn, 20, 197-243, (1995) · Zbl 0831.68096
[17] Heidenreich NB, Schindler A, Sperlich S (2010) Bandwidth selection methods for kernel density estimation—a review of performance. Social Science Electronic Publishing, Rochester
[18] Holmström, L., The accuracy and the computational complexity of a multivariate binned kernel density estimator, J Multivar Anal, 72, 264-309, (2000) · Zbl 1065.62511
[19] Holmström L, Hämäläinen A (1993) The self-organizing reduced kernel density estimator. In: IEEE international conference on neural networks, IEEE, pp 417-421
[20] Jeon, B.; Landgrebe, DA, Fast parzen density estimation using clustering-based branch and bound, IEEE Trans Pattern Anal Mach Intell, 16, 950-954, (1994)
[21] Jeon, J.; Taylor, JW, Using conditional kernel density estimation for wind power density forecasting, J Am Stat Assoc, 107, 66-79, (2012) · Zbl 1261.62031
[22] Jiang, L.; Cai, Z.; Wang, D.; Zhang, H., Improving tree augmented naive bayes for class probability estimation, Knowl-Based Syst, 26, 239-245, (2012)
[23] John GH, Langley P (2013) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on Uncertainty in artificial intelligence, pp 338-345
[24] Kayabol, K.; Zerubia, J., Unsupervised amplitude and texture classification of sar images with multinomial latent model, IEEE Trans Image Process, 22, 561-572, (2013) · Zbl 1373.94201
[25] Leray P, Francois O (2004) BNT structure learning package: documentation and experiments. Technical Report FRE CNRS 2645, Laboratoire PSI, Universite et INSA de Rouen
[26] Pérez, A.; Larrañaga, P.; Inza, I., Bayesian classifiers based on kernel density estimation: flexible classifiers, Int J Approx Reason, 50, 341-362, (2009) · Zbl 1191.68600
[27] Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: SIAM international conference on data mining, April 20-22, Bethesda, MD, USA
[28] Reynolds, DA; Rose, RC, Robust text-independent speaker identification using gaussian mixture speaker models, IEEE Trans Speech Audio Process, 3, 72-83, (1995)
[29] Rish, I., An empirical study of the naive bayes classifier, J Univ Comput Sci, 1, 127, (2001)
[30] Schwander O, Nielsen F (2012) Model centroids for the simplification of kernel density estimators. In: IEEE international conference on acoustics, speech and signal processing, pp 737-740
[31] Schwander O, Nielsen F (2013) Learning mixtures by simplifying kernel density estimators. Matrix Information Geometry. Springer, Berlin, pp 403-426 · Zbl 1269.94015
[32] Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New York · Zbl 1311.62004
[33] Scott, DW; Sheather, SJ, Kernel density estimation with binned data, Commun Stat Theory Methods, 14, 1353-1359, (1985)
[34] Shen, W.; Tokdar, ST; Ghosal, S., Adaptive bayesian multivariate density estimation with dirichlet mixtures, Biometrika, 100, 623-640, (2013) · Zbl 1284.62183
[35] Simonoff, JS, Smoothing methods in statistics, Technometrics, 92, 338-339, (1997) · Zbl 0859.62035
[36] Sucar LE (2015) Bayesian classifiers. Springer, London
[37] Topchy AP, Jain AK, Punch WF (2004) A mixture model for clustering ensembles. In: SDM, SIAM, pp 379-390
[38] Wang F, Zhang C, Lu N (2005) Boosting GMM and its two applications. In: International workshop on multiple classifier systems, vol 3541. Springer, Berlin, Heidelberg, pp 12-21
[39] Wang, S.; Wang, J.; Chung, FL, Kernel density estimation, kernel methods, and fast learning in large data sets, IEEE Trans Cybern, 44, 1-20, (2013)
[40] Xiong, F.; Liu, Y.; Cheng, J., Modeling and predicting opinion formation with trust propagation in online social networks, Commun Nonlinear Sci Numer Simul, 44, 513-524, (2017)
[41] Xiong, F.; Liu, Y.; Wang, L.; Wang, X., Analysis and application of opinion model with multiple topic interactions, Chaos, 27, 083,113, (2017)
[42] Xu, X.; Yan, Z.; Xu, S., Estimating wind speed probability distribution by diffusion-based kernel density method, Electr Power Syst Res, 121, 28-37, (2015)
[43] Yang, Y.; Webb, GI, Discretization for naive-bayes learning: managing discretization bias and variance, Mach Learn, 74, 39-74, (2009)
[44] Yin, H.; Allinson, NM, Self-organizing mixture networks for probability density estimation, IEEE Trans Neural Netw, 12, 405-411, (2001)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.