×

zbMATH — the first resource for mathematics

Ensemble quantile classifier. (English) Zbl 07160679
Summary: Both the median-based classifier and the quantile-based classifier are useful for discriminating high-dimensional data with heavy-tailed or skewed inputs. But these methods are restricted as they assign equal weight to each variable in an unregularized way. The ensemble quantile classifier is a more flexible regularized classifier that provides better performance with high-dimensional data, asymmetric data or when there are many irrelevant extraneous inputs. The improved performance is demonstrated by a simulation study as well as an application to text categorization. It is proven that the estimated parameters of the ensemble quantile classifier consistently estimate the minimal population loss under suitable general model assumptions. It is also shown that the ensemble quantile classifier is Bayes optimal under suitable assumptions with asymmetric Laplace distribution inputs.
MSC:
62 Statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Bickel, P. J.; Levina, E., Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations, Bernoulli, 10, 6, 989-1010 (2004) · Zbl 1064.62073
[2] Breiman, L., Stacked regressions, Mach. Learn., 24, 1, 49-64 (1996) · Zbl 0849.68104
[3] Breiman, L., Random forests, Mach. Learn., 45, 1, 5-32 (2001) · Zbl 1007.68152
[4] Cardoso-Cachopo, A., Improving Methods for Single-label Text Categorization (2007), Instituto Superior Tecnico, Universidade Tecnica de Lisboa, (Pd.D. thesis)
[5] Cleveland, W. S., Visualizing data (1993), Hobart Press
[6] Cortes, C.; Vapnik, V., Support-vector networks, Mach. Learn., 20, 3, 273-297 (1995) · Zbl 0831.68098
[7] Dietterich, T. G., Ensemble methods in machine learning, (International Workshop on Multiple Classifier Systems (2000), Springer), 1-15
[8] Dudoit, S.; Fridlyand, J.; Speed, T. P., Comparison of discrimination methods for the classification of tumors using gene expression data, J. Amer. Statist. Assoc., 97, 457, 77-87 (2002) · Zbl 1073.62576
[9] Fan, J.; Fan, Y., High dimensional classification using features annealed independence rules, Ann. Stat., 36, 6 (2008) · Zbl 1360.62327
[10] Feinerer, I., Hornik, K., 2017. tm: Text mining package. https://CRAN.R-project.org/package=tm. R package version 0.7-3.
[11] Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55, 1, 119-139 (1997) · Zbl 0880.68103
[12] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1, 1-22 (2010)
[13] Hall, P.; Titterington, D. M.; Xue, J.-H., Median-based classifiers for high-dimensional data, J. Amer. Statist. Assoc., 104, 488, 1597-1608 (2009) · Zbl 1205.62078
[14] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2009), Springer Series in Statistics. Springer-Verlag: Springer Series in Statistics. Springer-Verlag New York.
[15] Hennig, C.; Viroli, C., Quantile-based classifiers, Biometrika, 103, 2, 435-446 (2016) · Zbl 07072122
[16] Hennig, C., Viroli, C., 2016b. quantileda: Quantile classifier. https://CRAN.R-project.org/package=quantileDA.R package version 1.1.
[17] James, G.; Witten, D.; Hastie, T.; Tibshirani, R., An Introduction to Statistical Learning, Springer Series in Statistics (2013), Springer-Verlag: Springer-Verlag New York.
[18] Joe, H., Generating random correlation matrices based on partial correlations, J. Multivariate Anal., 97, 10, 2177-2189 (2006) · Zbl 1112.62055
[19] Koenker, R., Quantile Regression, Econometric Society Monographs (2005), Cambridge University Press · Zbl 1111.62037
[20] Koenker, R.; Bassett, G., Regression quantiles, Econometrica, 46, 1, 33-50 (1978) · Zbl 0373.62038
[21] Kuhn, M.; Johnson, K., Applied predictive modeling (2013), Springer · Zbl 1306.62014
[22] Lai, Y., McLeod, A.I., 2018. eqc: Ensemble quantile classifier. https://github.com/CliffordLai/eqc. R package version 1.0-5.
[23] Lewis, D., 1997. Reuters-21578 text categorization collection distribution 1.0.
[24] Lior, R., Ensemble Learning: Pattern Classification Using Ensemble Methods (2019), World Scientific Publishing Company
[25] Mason, D. M., Some characterizations of almost sure bounds for weighted multidimensional empirical distributions and a Glivenko-Cantelli theorem for sample quantiles, Z. Wahrscheinlichkeitstheor. Verwandte Geb., 59, 4, 505-513 (1982) · Zbl 0482.60029
[26] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., 2018. e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), tu wien. https://CRAN.R-project.org/package=e1071. R package version 1.7-0.
[27] Newbold, P.; Granger, C. W.T., Experience with forecasting univariate time series and the combination of forecasts, J. Roy. Statist. Soc. Ser. A, 137, 2, 131-165 (1974)
[28] Park, M. Y.; Hastie, T., Penalized logistic regression for detecting gene interactions, Biostatistics, 9, 1, 30-50 (2007) · Zbl 1274.62853
[29] Qiu, W., Joe., H., 2015. clustergeneration: Random cluster generation (with specified degree of separation). R package version 1.3.4.
[30] Schapire, R.; Freund, Y., Boosting: Foundations and Algorithms (2012), MIT Press · Zbl 1278.68021
[31] Sebastiani, F., Machine learning in automated text categorization, ACM Comput. Surv., 34, 1, 1-47 (2002)
[32] Silver, N., The Signal and the Noise (2012), Penguin Publishing Group
[33] Tibshirani, R.; Hastie, T.; Narasimhan, B.; Chu, G., Class prediction by nearest shrunken centroids, with applications to DNA microarrays, Statist. Sci., 18, 1, 104-117 (2003) · Zbl 1048.62109
[34] Ting, K. M.; Witten, I. H., Issues in stacked generalization, J. Artif. Intell. Res., 10, 271-289 (1999) · Zbl 0915.68075
[35] Venables, W. N.; Ripley, B. D., Modern Applied Statistics with S (2002), Springer: Springer New York · Zbl 1006.62003
[36] Wolpert, D. H., Stacked generalization, Neural Netw., 5, 2, 241-259 (1992)
[37] Zhou, Z.-H., Ensemble methods: foundations and algorithms (2012), Chapman and Hall/CRC
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.