×

Scalable interpretable learning for multi-response error-in-variables regression. (English) Zbl 1448.62079

Summary: Corrupted data sets containing noisy or missing observations are prevalent in various contemporary applications such as economics, finance and bioinformatics. Despite the recent methodological and algorithmic advances in high-dimensional multi-response regression, how to achieve scalable and interpretable estimation under contaminated covariates is unclear. In this paper, we develop a new methodology called convex conditioned sequential sparse learning (COSS) for error-in-variables multi-response regression under both additive measurement errors and random missing data. It combines the strengths of the recently developed sequential sparse factor regression and the nearest positive semi-definite matrix projection, thus enjoying stepwise convexity and scalability in large-scale association analyses. Comprehensive theoretical guarantees are provided and we demonstrate the effectiveness of the proposed methodology through numerical studies.

MSC:

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
62J07 Ridge regression; shrinkage estimators (Lasso)

Software:

SOFAR; camel
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Baselmans, B. M.L.; Jansen, R.; Ip, H. F., Multivariate genome-wide analyses of the well-being spectrum, Nature Genet., 51, 3, 445-451 (2019)
[2] Belloni, A.; Rosenbaum, M.; Tsybakov, A. B., Linear and conic programming estimators in high dimensional errors-in-variables models, J. R. Stat. Soc. Ser. B Stat. Methodol., 79, 3, 939-956 (2017) · Zbl 1411.62180
[3] Bickel, P. J.; Ritov, Y., Efficient estimation in the errors in variables model, Ann. Statist., 15, 2, 513-540 (1987) · Zbl 0643.62029
[4] Bickel, P. J.; Ritov, Y.; Tsybakov, A. B., Simultaneous analysis of lasso and dantzig selector, Ann. Statist., 37, 4, 1705-1732 (2009) · Zbl 1173.62022
[5] Buldygin, V.; Kozachenko, Y., Subgaussian random variables, Ukrainian Math. J., 32, 6, 483-489 (1980) · Zbl 0479.60012
[6] Bunea, F.; She, Y.; Wegkamp, M., Optimal selection of reduced rank estimators of high-dimensional matrices, Ann. Statist., 39, 2, 1282-1309 (2011) · Zbl 1216.62086
[7] Bunea, F.; She, Y.; Wegkamp, M., Joint variable and rank selection for parsimonious estimation of high-dimensional matrices, Ann. Statist., 40, 5, 2359-2388 (2012) · Zbl 1373.62246
[8] Carroll, R. J.; Ruppert, D.; Stefanski, L. A.; Crainiceanu, C. M., Measurement Error in Nonlinear Models (2006), Chapmen & Hall/CRC: Chapmen & Hall/CRC London · Zbl 1119.62063
[9] Datta, A.; Zou, H., Cocolasso for high-dimensional error-in-variables regression, Ann. Statist., 45, 6, 2400-2426 (2017) · Zbl 1486.62210
[10] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[11] Izenman, A., Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning (2008), Springer: Springer New York · Zbl 1155.62040
[12] Laurent, B.; Massart, P., Adaptive estimation of a quadratic functional by model selection, Ann. Statist., 28, 5, 1302-1338 (2000) · Zbl 1105.62328
[13] Li, M.; Li, R.; Ma, Y. M., Inference in high-dimensional linear measurement error models (2020), arXiv preprint, arXiv:2001.10142
[14] Liang, H.; Li, R., Variable selection for partially linear models with measurement errors, Ann. Statist., 104, 485, 234-248 (2009) · Zbl 1388.62208
[15] Liu, H.; Wang, L.; Zhao, T., Calibrated multivariate regression with application to neural semantic basis discovery, J. Mach. Learn. Res., 16, 47, 1579-1606 (2015) · Zbl 1351.62135
[16] Loh, P. L.; Wainwright, M. J., High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity, Ann. Statist., 40, 3, 1637-1664 (2012) · Zbl 1257.62063
[17] Ma, Y.; Li, R., Variable selection in measurement error models, Bernoulli, 16, 1, 274-300 (2010) · Zbl 1200.62071
[18] Mishra, A.; Dey, D. K.; Chen, K., Sequential co-sparse factor regression, J. Comput. Graph. Statist., 26, 4, 814-825 (2017)
[19] Rosenbaum, M.; Tsybakov, A., Sparse recovery under matrix uncertainty, Ann. Statist., 38, 5, 2620-2651 (2010) · Zbl 1373.62357
[20] Städler, N.; Bühlmann, P., Missing values: sparse inverse covariance estimation and an extension to sparse regression, Stat. Comput., 22, 1, 219-235 (2012) · Zbl 1322.62115
[21] Starbird, K.; Palen, L., (How) will the revolution be retweeted?: information diffusion and the 2011 Egyptian uprising, (Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (2012)), 7-16
[22] Sun, T.; Zhang, C.-H., Scaled sparse linear regression, Biometrika, 99, 4, 879-898 (2012) · Zbl 1452.62515
[23] Uematsu, Y.; Fan, Y.; Chen, K.; Lv, J.; Lin, W., Sofar: large-scale association network learning, IEEE Trans. Inform. Theory, 65, 8, 4924-4939 (2019) · Zbl 1432.68402
[24] Yang, K.; Lee, L.-F., Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models, J. Econometricss, 196, 1, 196-214 (2017) · Zbl 1443.62310
[25] Yang, K.; Lee, L.-F., Identification and estimation of spatial dynamic panel simultaneous equations models, J. Econometrics, 76, 32-46 (2019)
[26] Zheng, Z.; Bahadori, M. T.; Liu, Y.; Lv, J., Scalable interpretable multi-response regression via SEED, J. Mach. Learn. Res., 20, 107, 1-34 (2019) · Zbl 1441.62214
[27] Zheng, Z.; Li, Y.; Yu, C.; Li, G., Balanced estimation for high-dimensional measurement error models, Comput. Statist. Data Anal., 128, 78-91 (2018) · Zbl 1469.62183
[28] Zheng, Z.; Wu, J.; Li, Y.; Wang, Y., Sequential scaled sparse factor regression (2019), Manuscript
[29] Zhu, X.; Huang, D.; Pan, R.; Wang, H., Multivariate spatial autoregressive model for large scale social networks, J. Econometrics, 591-606 (2019) · Zbl 1456.62229
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.