Optimal variable selection in multi-group sparse discriminant analysis. (English) Zbl 1323.62060

Summary: This article considers the problem of multi-group classification in the setting where the number of variables \(p\) is larger than the number of observations \(n\). Several methods have been proposed in the literature that address this problem, however their variable selection performance is either unknown or suboptimal to the results known in the two-group case. In this work we provide sharp conditions for the consistent recovery of relevant variables in the multi-group case using the discriminant analysis proposal of the first author et al. [“Simultaneous sparse estimation of canonical vectors in the \(p\gg n\) setting”, Preprint, arXiv:1403.6095]. We achieve the rates of convergence that attain the optimal scaling of the sample size \(n\), number of variables \(p\) and the sparsity level \(s\). These rates are significantly faster than the best known results in the multi-group case. Moreover, they coincide with the minimax optimal rates for the two-group case. We validate our theoretical results with numerical analysis.


62H30 Classification and discrimination; cluster analysis (statistical aspects)


rda; penalizedLDA
Full Text: DOI arXiv Euclid


[1] Bickel, P. J. and Levina, E., Some theory of fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations., Bernoulli , 10(6):989 -1010, 2004. · Zbl 1064.62073 · doi:10.3150/bj/1106314847
[2] Bodnar, T. and Okhrin, Y., Properties of the singular, inverse and generalized inverse partitioned Wishart distributions., J. Multivar. Anal. , 99(10) :2389-2405, 2008. · Zbl 1151.62046 · doi:10.1016/j.jmva.2008.02.024
[3] Cai, T. T., Liu, W., and Luo, X., A constrained \(\ell_1\) minimization approach to sparse precision matrix estimation., J. Am. Stat. Assoc. , 106(494):594-607, 2011. · Zbl 1232.62087 · doi:10.1198/jasa.2011.tm10155
[4] Clemmensen, L., Hastie, T. J., Witten, D. M., and Ersbøll, B., Sparse discriminant analysis., Technometrics , 53(4):406-413, 2011. · doi:10.1198/TECH.2011.08118
[5] Fan, J. and Fan, Y., High-dimensional classification using features annealed independence rules., Annals of Statistics , 36(6) :2605-2637, 2008. · Zbl 1360.62327 · doi:10.1214/07-AOS504
[6] Fan, J., Feng, Y., and Tong, X., A road to classification in high dimensional space: The regularized optimal affine discriminant., J. R. Stat. Soc. B , 74(4):745-771, 2012. · doi:10.1111/j.1467-9868.2012.01029.x
[7] Gaynanova, I., Booth, J. G., and Wells, M. T., Simultaneous sparse estimation of canonical vectors in the \(p\gg n\) setting., J. Am. Stat. Assoc. , 2015,
[8] Hsu, D., Kakade, S., and Zhang, T., A tail inequality for quadratic forms of subgaussian random vectors., Electron. Commun. Probab. , 17(52):1-6, 2012. · Zbl 1309.60017 · doi:10.1214/ECP.v17-2079
[9] Kolar, M. and Liu, H., Optimal feature selection in high-dimensional discriminant analysis., ArXiv e-prints , arXiv :1306.6557, June 2013. · Zbl 1359.62250 · doi:10.1109/TIT.2014.2381241
[10] Laurent, B. and Massart, P., Adaptive estimation of a quadratic functional by model selection., Ann. Stat. , 28(5) :1302-1338, 2000. · Zbl 1105.62328 · doi:10.1214/aos/1015957395
[11] Mai, Q. and Zou, H., A note on the connection and equivalence of three sparse linear discriminant analysis methods., Technometrics , 55(2):243-246, 2013. · doi:10.1080/00401706.2012.746208
[12] Mai, Q., Zou, H., and Yuan, M., A direct approach to sparse discriminant analysis in ultra-high dimensions., Biometrika , 99(1):29-42, 2012. · Zbl 1437.62550 · doi:10.1093/biomet/asr066
[13] Mardia, K. V., Kent, J. T., and Bibby, J. M., Multivariate Analysis . Academic Press [Harcourt Brace Jovanovich Publishers], London, 1979. ISBN 0-12-471250-9. Probability and Mathematical Statistics: A Series of Monographs and Textbooks.
[14] McLachlan, G., Discriminant Analysis and Statistical Pattern Recognition , volume 544. John Wiley & Sons, 2004. · Zbl 1108.62317
[15] Muirhead, R. J., Aspects of Multivariate Statistical Theory . John Wiley & Sons Inc., New York, 1982. ISBN 0-471-09442-0. Wiley Series in Probability and Mathematical Statistics. · Zbl 0556.62028
[16] Obozinski, G., Wainwright, M. J., and Jordan, M. I., Support union recovery in high-dimensional multivariate regression., Ann. Stat. , 39(1):1-47, 2011. · Zbl 1373.62372 · doi:10.1214/09-AOS776
[17] Qiao, Z., Zhou, L., and Huang, J. Z., Sparse linear discriminant analysis with applications to high dimensional low sample size data., IAENG International Journal of Applied Mathematics , 39:48-60, 2008a. · Zbl 1229.62086
[18] Qiao, Z., Zhou, L., and Huang, J. Z., Effective linear discriminant analysis for high dimensional, low sample size data. In, Proceeding of the World Congress on Engineering , volume 2, pages 2-4. Citeseer, 2008b.
[19] Shao, J., Wang, Y., Deng, X., and Wang, S., Sparse linear discriminant analysis by thresholding for high dimensional data., Ann. Stat. , 39(2) :1241-1265, 2011. · Zbl 1215.62062 · doi:10.1214/10-AOS870
[20] Tibshirani, R. J., Hastie, T. J., Narasimhan, B., and Chu, G., Class prediction by nearest shrunken centroids, with applications to DNA microarrays., Stat. Sci. , 18(1):104-117, 2003. · Zbl 1048.62109 · doi:10.1214/ss/1056397488
[21] Vershynin, R., Introduction to the non-asymptotic analysis of random matrices. In Y. C. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications . Cambridge University Press, 2012. · Zbl 1365.62208 · doi:10.1017/CBO9780511794308.006
[22] Wainwright, M. J., Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso)., IEEE Trans. Inf. Theory , 55(5) :2183-2202, 2009. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[23] Wang, S. and Zhu, J., Improved centroids estimation for the nearest shrunken centroid classifier., Bioinformatics , 23(8):972, 2007.
[24] Witten, D. M. and Tibshirani, R. J., Penalized classification using Fisher’s linear discriminant., J. R. Stat. Soc. B , 73(5):753-772, 2011. · Zbl 1228.62079 · doi:10.1111/j.1467-9868.2011.00783.x
[25] Wu, M. C., Zhang, L., Wang, Z., Christiani, D. C., and Lin, X., Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection., Bioinformatics , 25(9) :1145-1151, 2009.
[26] Yuan, M. and Lin, Y., Model selection and estimation in regression with grouped variables., J. R. Stat. Soc. B , 68:49-67, 2006. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[27] Zhao, P. and Yu, B., On model selection consistency of Lasso., J. Mach. Learn. Res. , 7 :2541-2563, 2006. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.