zbMATH — the first resource for mathematics

Supervised learning via smoothed Polya trees. (English) Zbl 07176225
Summary: We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) [D. R. Cox, J. R. Stat. Soc., Ser. B 20, 215–242 (1958; Zbl 0088.35703)] and Linear Discriminant Analysis (LDA) [R. A. Fisher, “The use of multiple measurements in taxonomic problems”, Ann. Eugenics 7, 179–188 (1936; doi:10.1111/j.1469-1809.1936.tb02137.x); C. R. Rao, J. R. Stat. Soc., Ser. B 10, 159–203 (1948; Zbl 0034.07902)] to the Bayesian nonparametric setting, providing a competitor to MclustDA [C. Fraley and A. E. Raftery, J. Am. Stat. Assoc. 97, No. 458, 611–631 (2002; Zbl 1073.62545)]. This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G99 Nonparametric inference
62C10 Bayesian problems; characterization of Bayes procedures
Full Text: DOI
[1] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker PA, Vasudevan V, Warden P, Wicke M, Yu Y, Zhang X (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265-283
[2] Alpaydin E (2014) Introduction to machine learning (adaptive computation and machine learning). The MIT Press, Cambridge · Zbl 1298.68002
[3] Anderson JA, Rosenfeld E (eds) (1988) Neurocomputing: foundations of research. MIT Press, Cambridge
[4] Bensmail, H.; Celeux, G., Regularized Gaussian discriminant analysis through eigenvalue decomposition, J Am Stat Assoc, 91, 1743-1748 (1996) · Zbl 0885.62068
[5] Bergé, L.; Bouveyron, C.; Girard, S., HDclassif: an R package for model-based clustering and discriminant analysis of high-dimensional data, J Stat Softw, 46, 1-29 (2012)
[6] Beygelzimer A, Kakadet S, Langford J, Arya S, Mount D, Li S (2013) FNN: fast nearest neighbor search algorithms and applications. R package version 1:1
[7] Blackwell, D.; MacQueen, JB, Ferguson distributions via Polya urn schemes, Ann Stat, 1, 353-355 (1973) · Zbl 0276.62010
[8] Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM, pp 144-152
[9] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional discriminant analysis, Commun Stat Theory Methods, 36, 2607-2623 (2007) · Zbl 1128.62072
[10] Breiman, L., Bagging predictors, Mach Learn, 24, 123-140 (1996) · Zbl 0858.68080
[11] Breiman, L., Random forests, Mach Learn, 45, 5-32 (2001) · Zbl 1007.68152
[12] Burges, CJ, A tutorial on support vector machines for pattern recognition, Data Min Knowl Discov, 2, 121-167 (1998)
[13] Chandrashekar, G.; Sahin, F., A survey on feature selection methods, Comput Electr Eng, 40, 16-28 (2014)
[14] Cipolli, W.; Hanson, T., Computationally tractable approximate and smoothed Polya trees, Stat Comput, 27, 39-51 (2017) · Zbl 06696850
[15] Cipolli, W.; Hanson, T.; McLain, A., Bayesian nonparametric multiple testing, Comput Stat Data Anal, 101, 64-79 (2016) · Zbl 06918273
[16] Cortes, C.; Vapnik, V., Support-vector networks, Mach Learn, 20, 273-297 (1995) · Zbl 0831.68098
[17] Cover, T.; Hart, P., Nearest neighbor pattern classification, IEEE Trans Inf Theory, 13, 21-27 (1967) · Zbl 0154.44505
[18] Cox, DR, The regression analysis of binary sequences, J R Stat Soc Ser B (Methodol), 20, 215-242 (1958) · Zbl 0088.35703
[19] Cox DR (1966) Some procedures associated with the logistic qualitative response curve. Wiley, New York · Zbl 0158.17808
[20] Deng H (2014) Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456
[21] Duan, Kai-Bo; Keerthi, S. Sathiya, Which Is the Best Multiclass SVM Method? An Empirical Study, 278-285 (2005), Berlin, Heidelberg
[22] Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York · Zbl 0968.68140
[23] Dudani, SA, The distance-weighted k-nearest-neighbor rule, IEEE Trans Syst Man Cybern, 6, 325-327 (1976)
[24] Ferguson, TS, Prior distributions on spaces of probability measures, Ann Stat, 02, 615-629 (1974) · Zbl 0286.62008
[25] Fisher, RA, The use of multiple measurements in taxonomic problems, Ann Eugen, 7, 179-188 (1936)
[26] Florida R (2011) America’s great passport divide. http://www.theatlantic.com/national/archive/2011/03/americas-great-passport-divide/72399/. Accessed 15 Mar 2011
[27] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, J Am Stat Assoc, 97, 611-631 (2002) · Zbl 1073.62545
[28] Friedman, JH, Regularized discriminant analysis, J Am Stat Assoc, 84, 165-175 (1989)
[29] Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore · Zbl 0865.65009
[30] Hannah, LA; Blei, DM; Powell, WB, Dirichlet process mixtures of generalized linear models, J Mach Learn Res, 12, 1923-1953 (2011) · Zbl 1280.62031
[31] Hanson, T., For mixtures of finite Polya tree models, J Am Stat Assoc, 101, 1548-1565 (2006) · Zbl 1171.62323
[32] Hanson, T.; Branscum, A.; Gardner, I., Multivariate mixtures of Polya trees for modelling ROC data, Stat Model, 8, 81-96 (2008)
[33] Hanson, T.; Chen, Y., Bayesian nonparametric k-sample tests for censored and uncensored data, Comput Stat Data Anal, 71, 335-346 (2014) · Zbl 06975392
[34] Hanson, T.; Monteiro, J.; Jara, A., The Polya tree sampler: towards efficient and automatic independent Metropolis-Hastings proposals, J Comput Graph Stat, 20, 41-62 (2011)
[35] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, J R Stat Soc Series B (Methodol), 58, 155-176 (1996) · Zbl 0850.62476
[36] Hastie, T.; Tibshirani, R., Classification by pairwise coupling, Ann Stat, 26, 451-471 (1998) · Zbl 0932.62071
[37] Hastie T, Tibshirani R, Friedman J (2001) The Elements of statistical learning: data mining, inference, and prediction. Springer, Berlin · Zbl 0973.62007
[38] Ho TK (1995) Random decision forests. In: Third international conference on document analysis and recognition, ICDAR 1995, August 14-15, 1995, Montreal, Canada. Vol I, pp 278-282
[39] Hotelling, H., Analysis of a complex of statistical variables into principal components, J Educ Psychol, 24, 417-441 (1933) · JFM 59.1182.04
[40] Izenman, AJ, Recent developments in nonparametric density estimation, J Am Stat Assoc, 86, 205-224 (1991) · Zbl 0734.62040
[41] Jara, A.; Hanson, T.; Lesaffre, E., Robustifying generalized linear mixed models using a new class of mixtures of multivariate Polya trees, J Comput Graph Stat, 18, 838-860 (2009)
[42] Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive Bayes for classification. In: Proceedings of the 3rd international conference on advanced data mining and applications. Springer, pp 134-145
[43] Karsoliya, S., Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture, Int J Eng Trends Technol, 12, 714-717 (2012)
[44] Kotsiantis, SB, Supervised machine learning: a review of classification, Informatica, 31, 249-268 (2007) · Zbl 1162.68552
[45] Larrañaga, P.; Calvo, B.; Santana, R.; Bielza, C.; Galdiano, J.; Inza, I.; Lozano, JA; Armañanzas, R.; Santafé, G.; Pérez, A., Machine learning in bioinformatics, Brief Bioinform, 17, 86-112 (2006)
[46] Lavine, M., Some aspects of Polya tree distributions for statistical modelling, Ann Stat, 20, 1222-1235 (1992) · Zbl 0765.62005
[47] Lavine, M., More aspects of Polya tree distributions for statistical modelling, Ann Stat, 22, 1161-1176 (1994) · Zbl 0820.62016
[48] Ledl, T., Kernel density estimation: theory and application in discriminant analysis, Austrian J Stat, 33, 267-279 (2004)
[49] Leisch F, Dimitriadou E (2015) mlbench: machine learning benchmark problems. R package version 2.1-1
[50] Liaw, A.; Wiener, M., Classification and regression by random forest, R News, 2, 18-22 (2002)
[51] Ma, J.; Yu, MK; Fong, S.; Ono, K.; Sage, E.; Demchak, B.; Sharan, R.; Ideker, T., Using deep learning to model the hierarchical structure and function of a cell, Nat Methods, 15, 290-298 (2018)
[52] Ma Y, Guo G (2014) Support vector machines applications. Springer, Berlin
[53] Mantel, N., Models for complex contingency tables and polychotomous dosage response curves, Biometrics, 22, 83-95 (1966)
[54] Marzio, M.; Taylor, CC, On boosting kernel density methods for multivariate data: density estimation and classification, Stat Methods Appl, 14, 163-178 (2005) · Zbl 1084.62030
[55] Mauldin, RD; Sudderth, WD; Williams, SC, Polya trees and random distributions, Ann Stat, 20, 1203-1221 (1992) · Zbl 0765.62006
[56] Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2015) e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R package version 1.6-7
[57] Migration Policy Institute (2014). State immigration data profiles. http://www.migrationpolicy.org/programs/data-hub/state-immigration-data-profiles. Accessed 13 Mar 2016
[58] Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. The MIT Press, Cambridge · Zbl 1318.68003
[59] Montavon, G.; Lapuschkin, S.; Binder, A.; Samek, W.; Müller, K-R, Explaining nonlinear classification decisions with deep Taylor decomposition, Pattern Recognit, 65, 211-222 (2017)
[60] Montavon, G.; Samek, W.; Müller, K-R, Methods for interpreting and understanding deep neural networks, Digit Sig Process, 73, 1-15 (2018)
[61] Mukhopadhyay, S.; Ghosh, A., Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis, Comput Stat Data Anal, 55, 2344-2353 (2011) · Zbl 1328.62177
[62] Müller P, Rodriguez A (2013) Chapter 4: Polya Trees, volume 9 of NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics and American Statistical Assocation, pp 43-51
[63] National Archives and Records Administration (2012) Historical election results. http://www.archives.gov/federal-register/electoral-college/historical.html. Accessed 13 Mar 2016
[64] Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in neural information processing systems, pp 841-848
[65] Paddock, S.; Ruggeri, F.; Lavine, M.; West, M., Randomised Polya tree models for nonparametric Bayesian inference, Statistica Sinica, 13, 443-460 (2003) · Zbl 1015.62051
[66] Pati, D.; Bhattacharya, A.; Pillai, NS; Dunson, D., Posterior contraction in sparse bayesian factor models for massive covariance matrices, Ann Stat, 42, 1102-1130 (2014) · Zbl 1305.62124
[67] Plastria F, De Bruyne S, Carrizosa E (2008) Dimensionality reduction for classification. In: International conference on advanced data mining and applications. Springer, pp 411-418
[68] R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
[69] Rao, CR, The utilization of multiple measurements in problems of biological classification, J R Stat Soc Ser B, 10, 159-203 (1948) · Zbl 0034.07902
[70] Ripley BD (2007) Pattern recognition and neural networks. Cambridge University Press, Cambridge · Zbl 0853.62046
[71] Rish I (2001) An empirical study of the naive Bayes classifier. Technical report, IBM
[72] Rojas R (1996) Neural networks: a systematic introduction. Springer, New York · Zbl 0861.68072
[73] Runcie, DE; Mukherjee, S., Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices, Genetics, 194, 753-767 (2013)
[74] Schmidhuber, J., Deep learning in neural networks: an overview, Neural Netw, 61, 85-117 (2015)
[75] Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
[76] Scrucca, L.; Fop, M.; Murphy, TB; Raftery, AE, mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J, 8, 205-233 (2016)
[77] Shahbaba, B.; Neal, R., Nonlinear models using Dirichlet process mixtures, J Mach Learn Res, 10, 1829-1850 (2009) · Zbl 1235.62069
[78] Steinwart I, Christmann A (2008) Support vector machines. Springer, Berlin · Zbl 1203.68171
[79] Tax Foundation (2007). Federal taxes paid vs. federal spending received by state, 1981-2005. http://taxfoundation.org/article/federal-taxes-paid-vs-federal-spending-received-state-1981-2005. Accessed 13 Mar 2016
[80] Tsang, IW; Kwok, JT; Cheung, P-M, Core vector machines: fast SVM training on very large data sets, J Mach Learn Res, 6, 363-392 (2005) · Zbl 1222.68320
[81] United States Census Bureau (2010) American community survey, education attainment for states, percent with high school diploma and with bachelor’s degree: 2010. https://www.census.gov/newsroom/releases/xls/cb12-33table1states.xls. Accessed 13 Mar 2016
[82] United States Census Bureau (2014) State median income. https://www.census.gov/hhes/www/income/data/statemedian/. Accessed 13 Mar 2016
[83] United States Department of State Bureau of Consular Affairs (2015) U.S. passports and international travel: passport statistics. https://travel.state.gov/content/passports/en/passports/statistics.html. Accessed 13 Mar 2016
[84] Vapnik VN (1979) Estimation of dependences based on empirical data. Nauka, USSR (in Russian) · Zbl 0499.62005
[85] Vapnik, VN; Chervonenkis, A., A note on one class of perceptrons, Autom Remote Control, 25, 774-780 (1963)
[86] Vapnik, VN; Lerner, A., Pattern recognition using generalized portrait method, Autom Remote Control, 24, 709-715 (1962)
[87] Venables, W. N.; Ripley, B. D., Time Series Analysis, 387-418 (2002), New York, NY · Zbl 1006.62003
[88] Wong, WH; Ma, L., Optional Polya tree and Bayesian inference, Ann Stat, 38, 1433-1459 (2010) · Zbl 1189.62048
[89] Yegnanarayana B (2004) Artificial neural networks. Prentice-Hall, New Jersey
[90] Zambom, AZ; Dias, R., A review of kernel density estimation with applications to econometrics, Int Econ Rev (IER), 5, 20-42 (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.