×

Minimum distance Lasso for robust high-dimensional regression. (English) Zbl 1349.62322

Summary: We propose a minimum distance estimation method for robust regression in sparse high-dimensional settings. Likelihood-based estimators lack resilience against outliers and model misspecification, a critical issue when dealing with high-dimensional noisy data. Our method, Minimum Distance Lasso (MD-Lasso), combines minimum distance functionals customarily used in nonparametric estimation for robustness, with \(\ell_{1}\)-regularization. MD-Lasso is governed by a scaling parameter capping the influence of outliers: the loss is locally convex and close to quadratic for small squared residuals, and flattens for squared residuals larger than the scaling parameter. As the parameter approaches infinity the estimator becomes equivalent to least-squares Lasso. MD-Lasso is able to maintain the robustness of minimum distance functionals in sparse high-dimensional regression. The estimator achieves maximum breakdown point and enjoys consistency with fast convergence rates under mild conditions on the model error distribution. These hold for any solution in a convexity region around the true parameter and in certain cases for every solution. We provide an alternative set of results that do not require the solutions to lie within the convexity region but where the \(\ell_{2}\)-norm of the feasible solutions is constrained within a safety radius. Thanks to this constraint, a first-order optimization method is able to produce local optima that are consistent. A connection is established with re-weighted least-squares that intuitively explains MD-Lasso robustness. The merits of our method are demonstrated through simulation and eQTL analysis.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62J05 Linear regression; mixed models
62G35 Nonparametric robustness

Software:

bootlib; robustbase

References:

[1] Alfons, A., Croux, C., and Gelper, S. (2013), “Sparse least trimmed squares regression for analyzing high-dimensional large data sets,”, Ann. Appl. Stat. , 7, 226-248. · Zbl 1454.62123 · doi:10.1214/12-AOAS575
[2] Antczak, T. (2013), “The Exact l1 Penalty Function Method for Constrained Nonsmooth Invex Optimization Problems,” in, System Modeling and Optimization , Springer Berlin Heidelberg, vol. 391 of IFIP Advances in Information and Communication Technology , pp. 461-470. · Zbl 1266.49024 · doi:10.1007/978-3-642-36062-6_46
[3] Aravkin, A., Friedlander, M., Herrmann, F. J., and van Leeuwen, T. (2012), “Robust inversion, dimensionality reduction, and randomized sampling,”, Mathematical Programming , 134, 101-125. · Zbl 1254.90112 · doi:10.1007/s10107-012-0571-6
[4] Arefin, A., Mathieson, L., Johnstone, D., Berretta, R., and Moscato, P. (2012), “Unveiling clusters of RNA transcript pairs associated with markers of Alzheimer’s disease progression,”, PLoS ONE , 7 (9), e45535.
[5] Arendt, T., Holzer, M., Stöbe, A., Gärtner, U., Lüth, H. J., Brückner, M. K., and Ueberham, U. (2000), “Activated mitogenic signaling induces a process of dedifferentiation in Alzheimer’s disease that eventually results in cell death,”, Annals of the New York Academy of Science , 920-249.
[6] Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. (2012), “Optimization with sparsity-inducing penalties,”, Foundations and Trends in Machine Learning , 4, 1-106. · Zbl 06064248
[7] Bartlett, P. L. and Mendelson, S. (2003), “Rademacher and gaussian complexities: risk bounds and structural results,”, Journal of Machine Learning Research , 3, 463-482. · Zbl 1084.68549 · doi:10.1162/153244303321897690
[8] Basu, A., Harris, I. R., Hjort, N. L., and Jones, M. C. (1998), “Robust and efficient estimation by minimising a density power divergence,”, Biometrika , 85. · Zbl 0926.62021 · doi:10.1093/biomet/85.3.549
[9] Ben-Israel, A. and Mond, B. (1986), “What is invexity,”, Journal of the Australian Mathematical Society Series B , 28, 1-9. · Zbl 0603.90119 · doi:10.1017/S0334270000005142
[10] Beran, R. (1977), “Robust location estimates,”, Annals of Statistics , 5, 431-444. · Zbl 0381.62032 · doi:10.1214/aos/1176343841
[11] Bertsekas, D. (2011), “Incremental gradient, subgradient, and proximal methods for convex optimization: a survey,”, Optimization for Machine Learning, MIT Press .
[12] Bickel, P., Ritov, Y., and Tsybakov, A. (2009), “Simultaneous analysis of Lasso and Dantzig selector,”, Annals of Statistics , 37, 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[13] Chi, E. C. and Scott, D. W. (2014), “Robust parametric classification and variable selection by a minimum distance criterion,”, Journal of Computational and Graphical Statistics , 23, 111-128. · doi:10.1080/10618600.2012.737296
[14] Davison, A. C. and Hinkley, D. V. (1997), Bootstrap Methods and Their Applications , Cambridge: Cambridge University Press, iSBN 0-521-57391-2. · Zbl 0886.62001
[15] Donoho, D. L. and Liu, R. C. (1994), “The “Automatic” robustness of minimum distance functional,”, Annals of Statistics , 16, 552-586. · Zbl 0684.62030 · doi:10.1214/aos/1176350820
[16] Fan, J., Lv, J., and Qi, L. (2011), “Sparse high dimensional models in economics,”, Annual Review of Economics , 3, 291.
[17] Ghai, R., Mobli, M., Norwood, S. J., Bugarcic, A., Teasdale, R. D., et al. (2011), “Phox homology band 4.1/ezrin/radixin/moesin-like proteins function as molecular scaffolds that interact with cargo receptors and Ras GTPases,”, Proceedings of the National Academy of Science USA , 108, 7763-7768.
[18] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel, W. A. (1986), Robust Statistics:The Approach Based on Influence Functions , Wiley Series in Probability and Statistics. · Zbl 0593.62027
[19] Huber, P. J. (1981), Robust Statistics , Wiley New York. · Zbl 0536.62025
[20] Jacob, L., Obozinski, G., and Vert, J.-P. (2009), “Group lasso with overlap and graph lasso,” in, Proc. of the 26th Annual International Conference on Machine Learning , New York, NY, USA: ACM, pp. 433-440.
[21] Jenatton, R., Gramfort, A., Michel, V., Obozinski, G., Eger, E., Bach, F., and Thirion, B. (2012), “Multi-scale mining of fMRI data with hierarchical structured sparsity,”, SIAM Journal on Imaging Sciences , 5, 835-856. · Zbl 1263.90059 · doi:10.1137/110832380
[22] Jiang, X., Jia, L. W., Li, X. H., Cheng, X., Xie, J. Z., Ma, Z. W., Xu, W. J., Liu, Y., Yao, Y., Du, L. L., and Zhou, X. W. (2013), “Capsaicin ameliorates stress-induced Alzheimer’s disease-like pathological and cognitive impairments in rats,”, Journal of Alzheimer’s Disease , 35 (1), 91- 105.
[23] Ledoux, M. and Talagrand, M. (1991), Probability in Banach Spaces: Isoperimetry and Processes , Ergebnisse der Mathematik und Ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics Series, Springer.
[24] Loh, P.-L. and Wainwright, M. J. (2013), “Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima,”, . · Zbl 1360.62276
[25] Lööv, C., Fernqvist, M., Walmsley, A., Marklund, N., and Erlandsson, A. (2012), “Neutralization of LINGO-1 during in vitro differentiation of neural stem cells results in proliferation of immature neurons,”, PLoS ONE .
[26] Mairal, J. and Yu, B. (2013), “Supervised feature selection in graphs with path coding penalties and network flows,”, . · Zbl 1317.68175
[27] Maronna, R. A., Martin, R. D., and Yohai, V. J. (2006), Robust Statistics: Theory and Methods , Chichester: Wiley. · Zbl 1094.62040 · doi:10.1002/0470010940
[28] Martins, A., Figueiredo, M. A. T., Aguiar, P., Smith, N. A., and Xing, E. P. (2011), “Online learning of structured predictors with multiple kernels,” in, International Conf. on Artificial Intelligence and Statistics - AISTATS .
[29] Meinshausen, N. and Bühlmann, P. (2006), “High-dimensional graphs and variable selection with the Lasso,”, Annals of Statistics , 34, 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[30] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012), “A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers,”, Statististical Science , 27, 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400
[31] Nesterov, Y. E. (2007), “Gradient methods for minimizing composite objective function,”, Technical Report 76, Center of Operations Research and Econometrics, Catholic University of Louvain .
[32] Nguyen, N. H., Nasrabadi, N. M., and Tran, T. D. (2011), “Robust Lasso with missing and grossly corrupted observations,”, Advances in Neural Information Processing Systems 24 , 1881-1889.
[33] Raskutti, G., Wainwright, M. J., and Yu, B. (2010), “Restricted Eigenvalue Properties for Correlated Gaussian Designs,”, Journal of Machine Learning Research , 11, 2241-2259.
[34] Reiman, E., Webster, J., Myers, A., Hardy, J., Dunckley, T., Zismann, V. L., Joshipura, K. D., Pearson, J. V., Hu-Lince, D., Huentelman, M. J., Craig, D. W., Coon, K. D., et al. (2007), “GAB2 alleles modify Alzheimer’s risk in APOE epsilon4 carriers,”, Neuron , 54, 713-720.
[35] Richard, E., Savalle, P., and Vayatis, N. (2012), “Estimation of simultaneously sparse and low rank matrices,” in, Proceedings of the 29th International Conference on Machine Learning (ICML-12) , New York, NY, USA, pp. 1351-1358. · Zbl 1261.53031 · doi:10.3836/tjm/1358951335
[36] Scott, D. (2001), “Parametric statistical modeling by minimum integrated square error,”, Technometrics , 43, 274-285. · doi:10.1198/004017001316975880
[37] Sugiyama, M., Suzuki, T., Kanamori, T., Du Plessis, M. C., Liu, S., and Takeuchi, I. (2012), “Density-difference estimation,”, Advances in Neural Information Processing Systems , 25, 692-700.
[38] Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,”, Journal of the Royal Statistical Society, Series B , 58, 267-288. · Zbl 0850.62538
[39] Tibshirani, R., Saunders, M., Rosset, R., Zhu, J., and Knight, K. (2005), “Sparsity and smoothness via the fused lasso,”, Journal of the Royal Statistical Society Series B , 91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[40] van Rijsbergen, C. J. (1979), Information Retrieval , Butterworth. · Zbl 0227.68052
[41] Vollbach, H., Heun, R., Morris, C. M., Edwardson, J. A., McKeith, I. G., Jessen, F., Schulz, A., Maier, W., and Kölsch, H. (2005), “APOA1 polymorphism influences risk for early-onset nonfamiliar AD,”, Annals of Neurology , 58, 436-441.
[42] Vu, V. Q., Ravikumar, P., Naselaris, T., Kay, K. N., Gallant, J. L., and Yu, B. (2011), “Encoding and decoding V1 FMRI responses to natural images with sparse nonparametric models,”, Annals of Applied Statistics , 5, 1159-1182. · Zbl 1454.62413 · doi:10.1214/11-AOAS476
[43] Wang, H., Li, G., and Jiang, G. (2007), “Robust regression shrinkage and consistent variable selection through the LAD-lasso,”, Journal of Business and Economics Statistics , 25, 347-355.
[44] Wolfowitz, J. (1957), “The minimum distance method,”, Annals of Mathematical Statistics , 28, 75-88. · Zbl 0086.35403 · doi:10.1214/aoms/1177707038
[45] Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. M., and Lange, K. (2009), “Genome-wide association analysis by lasso penalized logistic regression,”, Bioinformatics , 25, 714-721.
[46] Yuan, M. and Lin, Y. (2006), “Model selection and estimation in regression with grouped variables,”, Journal of the Royal Statistical Society, Series B , 68, 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.