×

Likelihood-based selection and sharp parameter estimation. (English) Zbl 1261.62020

Summary: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained \(L_0\) likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features.
Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of \(L_0\) constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection in linear regression and logistic regression, and estimate a precision matrix in Gaussian graphical models. In these situations, we gain a new theoretical insight and obtain favorable numerical results. Finally, we discuss an application to predict the metastasis status of breast cancer patients with their gene expression profiles. This article has online supplementary material.

MSC:

62F10 Point estimation
62J12 Generalized linear models (logistic models)
62J05 Linear regression; mixed models
65C60 Computational problems in statistics (MSC2010)

References:

[1] Banerjee O., Journal of Machine Learning Research 9 pp 485– (2008)
[2] Bickel P., The Annals of Statistics 37 pp 1705– (2008) · Zbl 1173.62022 · doi:10.1214/08-AOS620
[3] Boyd S., Convex Optimization (2004) · doi:10.1017/CBO9780511804441
[4] DOI: 10.1093/biomet/asn034 · Zbl 1437.62415 · doi:10.1093/biomet/asn034
[5] DOI: 10.1198/016214501753382273 · Zbl 1073.62547 · doi:10.1198/016214501753382273
[6] Fan J., Journal of the American Statistical Association 96 pp 1348– (2001)
[7] DOI: 10.1093/biostatistics/kxm045 · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[8] Gasso G., Signal Processing, IEEE 57 pp 4686– (2009) · Zbl 1391.90489 · doi:10.1109/TSP.2009.2026004
[9] Higgins M. E., Nucleic Acids Research 35 (1) pp D721– (2007) · doi:10.1093/nar/gkl811
[10] Kim Y., Journal of the American Statistical Association 103 pp 1665– (2008)
[11] DOI: 10.1198/016214508000001066 · Zbl 1286.62062 · doi:10.1198/016214508000001066
[12] Li H., Biostatistics 7 pp 302– (2006) · Zbl 1169.62378 · doi:10.1093/biostatistics/kxj008
[13] DOI: 10.1214/009053606000000281 · Zbl 1113.62082 · doi:10.1214/009053606000000281
[14] DOI: 10.1214/08-EJS176 · Zbl 1320.62135 · doi:10.1214/08-EJS176
[15] Rothman A., Biometrika 97 pp 539– (2009) · Zbl 1195.62089 · doi:10.1093/biomet/asq022
[16] DOI: 10.1214/aos/1176344136 · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[17] DOI: 10.1198/016214502753479356 · Zbl 1073.62509 · doi:10.1198/016214502753479356
[18] Tibshirani R., Journal of the Royal Statistical Society, Series B 58 pp 267– (1996)
[19] Wang Y., Lancet 365 pp 671– (2005)
[20] Wei Z., Biostatistics 8 pp 265– (2007) · Zbl 1129.62107 · doi:10.1093/biostatistics/kxl007
[21] DOI: 10.1093/biomet/asm018 · Zbl 1142.62408 · doi:10.1093/biomet/asm018
[22] Zhang C.-H., The Annals of Statistics 38 pp 894– (2010) · Zbl 1183.62120 · doi:10.1214/09-AOS729
[23] DOI: 10.1214/009053607000000802 · Zbl 1142.62027 · doi:10.1214/009053607000000802
[24] Zhao P., Journal of Machine Learning Research 7 pp 2541– (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.