×

Regularization through variable selection and conditional MLE with application to classification in high dimensions. (English) Zbl 1149.62052

Summary: It is often the case that high-dimensional data consist of only a few informative components. Standard statistical modeling and estimation in such a situation is prone to inaccuracies due to overfitting, unless regularization methods are practiced. In the context of classification, we propose a class of regularization methods through shrinkage estimators. The shrinkage is based on variable selection coupled with conditional maximum likelihood. Using Stein’s unbiased estimator of the risk, we derive an estimator for the optimal shrinkage method within a certain class. A comparison of the optimal shrinkage methods in a classification context, with the optimal shrinkage method when estimating a mean vector under squared loss, is given. The latter problem is extensively studied, but it seems that the results of those studies are not completely relevant for classification. We demonstrate and examine our method on simulated data and compare it to feature annealed independence rule and Fisher’s rule.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F10 Point estimation
65C60 Computational problems in statistics (MSC2010)

Software:

EBayesThresh
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bickel, P.; Levina, E., Some theory for Fisher’s linear discriminant function, “naive Bayes”, and some alternatives where there are many more variables than observations, Bernoulli, 10, 6, 989-1010 (2004) · Zbl 1064.62073
[2] Brown, L. D., Fundamentals of Statistical Exponential Families, with Applications in Statistical Decision Theory (1986), IMS: IMS Hayward, CA · Zbl 0685.62002
[3] Campbell, N. A., Shrunken estimation in discriminant and canonical variable analysis, Appl. Statist., 29, 1, 5-14 (1980) · Zbl 0454.62053
[4] Donoho, D. L.; Johnstone, I. M., Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81, 425-455 (1994) · Zbl 0815.62019
[5] Donoho, D. L.; Johnstone, I. M., Adapting to unknown smoothness via wavelet shrinkage, J. Amer. Statist. Assoc., 90, 4, 1200-1224 (1995) · Zbl 0869.62024
[6] Fan, J., Fan, Y., to appear. High dimensional classification using features annealed independence rules. Ann. Statist.; Fan, J., Fan, Y., to appear. High dimensional classification using features annealed independence rules. Ann. Statist. · Zbl 1360.62327
[7] Foster, D. P.; George, E. L., The risk inflation criterion for multiple regression, Ann. Statist., 22, 1947-1975 (1994) · Zbl 0829.62066
[8] Friedman, J., Regularized discriminant analysis, J. Amer. Statist. Assoc., 84, 165-175 (1989)
[9] Hastie, T.; Buja, A.; Tibshirani, R., Penalized discriminant analysis, Ann. Statist., 23, 73-102 (1995) · Zbl 0821.62031
[10] Johnstone, I. M.; Silverman, B. W., Empirical Bayes selection of wavelet thresholds, Ann. Statist., 33, 4, 1700-1752 (2005) · Zbl 1078.62005
[11] Lehmann, E. L., Testing Statistical Hypothesis (1986), Wiley: Wiley New York · Zbl 0608.62020
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.