Wong, Kam Chung; Li, Zifan; Tewari, Ambuj Lasso guarantees for \(\beta \)-mixing heavy-tailed time series. (English) Zbl 1450.62117 Ann. Stat. 48, No. 2, 1124-1142 (2020). Summary: Many theoretical results for lasso require the samples to be i.i.d. Recent work has provided guarantees for lasso assuming that the time series is generated by a sparse Vector Autoregressive (VAR) model with Gaussian innovations. Proofs of these results rely critically on the fact that the true data generating mechanism (DGM) is a finite-order Gaussian VAR. This assumption is quite brittle: linear transformations, including selecting a subset of variables, can lead to the violation of this assumption. In order to break free from such assumptions, we derive nonasymptotic inequalities for estimation error and prediction error of lasso estimate of the best linear predictor without assuming any special parametric form of the DGM. Instead, we rely only on (strict) stationarity and geometrically decaying \(\beta \)-mixing coefficients to establish error bounds for lasso for sub-Weibull random vectors. The class of sub-Weibull random variables that we introduce includes sub-Gaussian and subexponential random variables but also includes random variables with tails heavier than an exponential. We also show that, for Gaussian processes, the \(\beta \)-mixing condition can be relaxed to summability of the \(\alpha \)-mixing coefficients. Our work provides an alternative proof of the consistency of lasso for sparse Gaussian VAR models. But the applicability of our results extends to non-Gaussian and nonlinear times series models as the examples we provide demonstrate. Cited in 30 Documents MSC: 62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH) 62J07 Ridge regression; shrinkage estimators (Lasso) 62G32 Statistics of extreme values; tail inference 62N05 Reliability and life testing Keywords:time series; mixing; high-dimensional estimation; Lasso × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452-2482. · Zbl 1373.62244 · doi:10.1214/12-AOS1032 [2] Alquier, P. and Doukhan, P. (2011). Sparsity considerations for dependent variables. Electron. J. Stat. 5 750-774. · Zbl 1274.62462 · doi:10.1214/11-EJS626 [3] Basu, S. and Michailidis, G. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Statist. 43 1535-1567. · Zbl 1317.62067 · doi:10.1214/15-AOS1315 [4] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009 · doi:10.1137/080716542 [5] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620 [6] Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107-144. · Zbl 1189.60077 · doi:10.1214/154957805100000104 [7] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg. · Zbl 1273.62015 [8] Chandrasekaran, V., Recht, B., Parrilo, P. A. and Willsky, A. S. (2012). The convex geometry of linear inverse problems. Found. Comput. Math. 12 805-849. · Zbl 1280.52008 · doi:10.1007/s10208-012-9135-7 [9] Chen, X., Xu, M. and Wu, W. B. (2013). Covariance and precision matrix estimation for high-dimensional time series. Ann. Statist. 41 2994-3021. · Zbl 1294.62123 · doi:10.1214/13-AOS1182 [10] Chudik, A. and Pesaran, M. H. (2011). Infinite-dimensional VARs and factor models. J. Econometrics 163 4-22. · Zbl 1441.62652 · doi:10.1016/j.jeconom.2010.11.002 [11] Chudik, A. and Pesaran, M. H. (2013). Econometric analysis of high dimensional VARs featuring a dominant unit. Econometric Rev. 32 592-649. · Zbl 1491.62198 [12] Chudik, A. and Pesaran, M. H. (2016). Theory and practice of GVAR modelling. J. Econ. Surv. 30 165-197. [13] Davis, R. A., Zang, P. and Zheng, T. (2016). Sparse vector autoregressive modeling. J. Comput. Graph. Statist. 25 1077-1096. [14] Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914-18919. [15] Fan, J., Qi, L. and Tong, X. (2016). Penalized least squares estimation with weakly dependent data. Sci. China Math. 59 2335-2354. · Zbl 1367.62211 · doi:10.1007/s11425-016-0098-x [16] Foss, S., Korshunov, D. and Zachary, S. (2011). An Introduction to Heavy-Tailed and Subexponential Distributions. Springer Series in Operations Research and Financial Engineering. Springer, New York. · Zbl 1250.62025 [17] Guo, S., Wang, Y. and Yao, Q. (2016). High-dimensional and banded vector autoregressions. Biometrika 103 889-903. [18] Han, F. and Liu, H. (2013). Transition matrix estimation in high dimensional time series. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) 172-180. [19] Han, F., Lu, H. and Liu, H. (2015). A direct estimation of high dimensional stationary vector autoregressions. J. Mach. Learn. Res. 16 3115-3150. · Zbl 1351.62165 [20] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability 143. CRC Press, Boca Raton, FL. · Zbl 1319.68003 [21] Hayashi, F. (2000). Econometrics. Princeton Univ. Press, Princeton, NJ. · Zbl 0994.62107 [22] Ibragimov, I. d. A. and Rozanov, Y. A. (1978). Gaussian Random Processes. Applications of Mathematics 9. Springer, New York-Berlin. · Zbl 0392.60037 [23] Kock, A. B. and Callot, L. (2015). Oracle inequalities for high dimensional vector autoregressions. J. Econometrics 186 325-344. · Zbl 1331.62348 · doi:10.1016/j.jeconom.2015.02.013 [24] Kulkarni, S., Lozano, A. C. and Schapire, R. E. (2005). Convergence and consistency of regularized boosting algorithms with stationary \(\beta \)-mixing observations. In Advances in Neural Information Processing Systems 819-826. [25] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist. 44 907-927. · Zbl 1341.62061 · doi:10.1214/15-AOS1371 [26] Liebscher, E. (2005). Towards a unified approach for proving geometric ergodicity and mixing properties of nonlinear autoregressive processes. J. Time Ser. Anal. 26 669-689. · Zbl 1092.62091 · doi:10.1111/j.1467-9892.2005.00412.x [27] Loh, P.-L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40 1637-1664. · Zbl 1257.62063 · doi:10.1214/12-AOS1018 [28] Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, Berlin. · Zbl 1072.62075 [29] Mcdonald, D. J., Shalizi, C. R. and Schervish, M. J. (2011). Estimating beta-mixing coefficients. In International Conference on Artificial Intelligence and Statistics 516-524. · Zbl 1330.62344 · doi:10.1214/15-EJS1094 [30] McMurry, T. L. and Politis, D. N. (2015). High-dimensional autocovariance matrices and optimal linear prediction. Electron. J. Stat. 9 753-788. · Zbl 1309.62154 · doi:10.1214/15-EJS1000 [31] Medeiros, M. C. and Mendes, E. F. (2016). \( \ell_1\)-regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors. J. Econometrics 191 255-271. · Zbl 1390.62179 · doi:10.1016/j.jeconom.2015.10.011 [32] Merlevède, F., Peligrad, M. and Rio, E. (2011). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Related Fields 151 435-474. · Zbl 1242.60020 · doi:10.1007/s00440-010-0304-9 [33] Nardi, Y. and Rinaldo, A. (2011). Autoregressive process modeling via the Lasso procedure. J. Multivariate Anal. 102 528-549. · Zbl 1207.62169 · doi:10.1016/j.jmva.2010.10.012 [34] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069-1097. · Zbl 1216.62090 · doi:10.1214/10-AOS850 [35] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665-1697. · Zbl 1436.62204 [36] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400 [37] Ngueyep, R. and Serban, N. (2015). Large-vector autoregression for multilayer spatially correlated time series. Technometrics 57 207-216. · doi:10.1080/00401706.2014.902775 [38] Nicholson, W. B., Bien, J. and Matteson, D. S. (2014). Hierarchical vector autoregression. Preprint. Available at arXiv:1412.5250. [39] Nicholson, W. B., Matteson, D. S. and Bien, J. (2017). VARX-l: Structured regularization for large vector autoregressions with exogenous variables. Int. J. Forecast. 33 627-651. [40] Pisier, G. (2016). Subgaussian sequences in probability and Fourier analysis. Grad. J. Math. 1 60-80. [41] Qiu, H., Xu, S., Han, F., Liu, H. and Caffo, B. (2015). Robust estimation of transition matrices in high dimensional heavy-tailed vector autoregressive processes. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) 1843-1851. [42] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241-2259. · Zbl 1242.62071 [43] Rosenblatt, M. (1956). A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 42 43-47. · Zbl 0070.13804 · doi:10.1073/pnas.42.1.43 [44] Rudelson, M. and Vershynin, R. (2013). Hanson-Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18 no. 82, 9. · Zbl 1329.60056 · doi:10.1214/ECP.v18-2865 [45] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434-3447. · Zbl 1364.94158 · doi:10.1109/TIT.2013.2243201 [46] Sivakumar, V., Banerjee, A. and Ravikumar, P. K. (2015). Beyond sub-Gaussian measurements: High-dimensional structured estimation with sub-exponential designs. In Advances in Neural Information Processing Systems 2206-2214. [47] Song, S. and Bickel, P. J. (2011). Large vector auto regressions. Preprint. Available at arXiv:1106.3915. [48] Tao, T. and Vu, V. (2013). Random matrices: Sharp concentration of eigenvalues. Random Matrices Theory Appl. 2 1350007, 31. · Zbl 1296.15022 · doi:10.1142/S201032631350007X [49] Tjøstheim, D. (1990). Nonlinear time series and Markov chains. Adv. in Appl. Probab. 22 587-611. · Zbl 0712.62080 [50] Uematsu, Y. (2015). Penalized likelihood estimation in high-dimensional time series models and uts application. Preprint. Available at arXiv:1504.06706. [51] Vidyasagar, M. (2003). Learning and Generalization: With Applications to Neural Networks, 2nd ed. Communications and Control Engineering Series. Springer, London. · Zbl 1008.68102 [52] Wang, Z., Han, F. and Liu, H. (2013). Sparse principal component analysis for high dimensional multivariate time series. In Artificial Intelligence and Statistics 48-56. [53] Wang, H., Li, G. and Tsai, C.-L. (2007). Regression coefficient and autoregressive order shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 63-78. · Zbl 07555350 [54] Wong, K. C., Li, Z. and Tewari, A. (2016). Lasso guarantees for time series estimation under subgaussian tails and \(\beta \)-mixing. Preprint. Available at arXiv:1602.04265. [55] Wong, K. C., Li, Z. and Tewari, A. (2020). Supplement to “Lasso guarantees for \(\beta \)-mixing heavy-tailed time series.” https://doi.org/10.1214/19-AOS1840SUPP. [56] Wu, W. B. and Wu, Y. N. (2016). High-dimensional linear models with dependent observations. Electron. J. Stat. 10 352-379. · Zbl 1333.62172 · doi:10.1214/16-EJS1108 [57] Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 22 94-116. · Zbl 0802.60024 · doi:10.1214/aop/1176988849 [58] Zhang, D. and Wu, W. B. (2017). Gaussian approximation for high dimensional time series. Ann. Statist. 45 1895-1919. · Zbl 1381.62254 · doi:10.1214/16-AOS1512 [59] Zhao, P. · Zbl 1222.62008 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.