Supervised learning via smoothed Polya trees. (English) Zbl 07176225
Summary: We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) [D. R. Cox, J. R. Stat. Soc., Ser. B 20, 215–242 (1958; Zbl 0088.35703)] and Linear Discriminant Analysis (LDA) [R. A. Fisher, “The use of multiple measurements in taxonomic problems”, Ann. Eugenics 7, 179–188 (1936; doi:10.1111/j.1469-1809.1936.tb02137.x); C. R. Rao, J. R. Stat. Soc., Ser. B 10, 159–203 (1948; Zbl 0034.07902)] to the Bayesian nonparametric setting, providing a competitor to MclustDA [C. Fraley and A. E. Raftery, J. Am. Stat. Assoc. 97, No. 458, 611–631 (2002; Zbl 1073.62545)]. This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G99 Nonparametric inference
62C10 Bayesian problems; characterization of Bayes procedures
