×

zbMATH — the first resource for mathematics

Variable screening for high dimensional time series. (English) Zbl 06864473
Summary: Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid sub-Gaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) [J. Fan and J. Lv, “Sure independence screening for ultrahigh dimensional feature space”, J. R. Stat. Soc. Ser. B (Statistical Methodol. 70, No. 5, 849–911 (2008; doi:10.1111/j.1467-9868.2008.00674.x)] for high dimensional linear models with dependent and/or heavy tailed covariates and errors. We also introduce a generalized least squares screening (GLSS) procedure which utilizes the serial correlation present in the data. By utilizing this serial correlation when estimating our marginal effects, GLSS is shown to outperform SIS in many cases. For both procedures we prove sure screening properties, which depend on the moment conditions, and the strength of dependence in the error and covariate processes, amongst other factors. Additionally, combining these screening procedures with the adaptive Lasso is analyzed. Dependence is quantified by functional dependence measures [W. B. Wu, Proc. Natl. Acad. Sci. USA 102, No. 40, 14150–14154 (2005; Zbl 1135.62075)], and the results rely on the use of Nagaev-type and exponential inequalities for dependent random variables. We also conduct simulations to demonstrate the finite sample performance of these procedures, and include a real data application of forecasting the US inflation rate.

MSC:
62F07 Statistical ranking and selection procedures
62J07 Ridge regression; shrinkage estimators (Lasso)
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Amemiya, T. (1973). Generalized Least Squares with an Estimated Autocovariance Matrix., Econometrica 41 723-732. · Zbl 0305.62046
[2] Andrews, D. W. K. (1984). Non-Strong Mixing Autoregressive Processes., Journal of Applied Probability 21 930-934. · Zbl 0552.60049
[3] Basu, S. andMichailidis, G. (2015). Regularized estimation in sparse high-dimensional time series models., Ann. Statist.43 1535-1567. · Zbl 1317.62067
[4] Bickel, P. J. (2008). Discussion of Sure independence screening for ultrahigh dimensional feature space., J.Roy. Statist. Soc. B.70 883-884. · Zbl 1411.62187
[5] Bickel, P. J., Brown, B. B., Huang, H. andLi, Q. (2009). An overview of recent developments in genomics and associated statistical methods., Phil. Transactions of the Roy. Soc. A 367 4313-4337. · Zbl 1185.62184
[6] Bickel, P. J., Ritov, Y. andTsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector., Ann. Statist.37 1705-1732. · Zbl 1173.62022
[7] Brockwell, P. J. andDavis, R. A. (1991)., Time Series: Theory and Methods. Springer. · Zbl 0709.62080
[8] Bühlmann, P. (1995). Moving-average representation of autoregressive approximations., Stochastic Processes and their Applications 60 331-342. · Zbl 0847.60027
[9] Buhlmann, P. andVan de Geer, S. (2011)., Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer. · Zbl 1273.62015
[10] Chang, J., Tang, C. Y. andWu, Y. (2013). Marginal empirical likelihood and sure independence feature screening., Ann. Statist.41 2123-2148. · Zbl 1277.62109
[11] Chen, X., Xu, M. andWu, W. B. (2013). Covariance and precision matrix estimation for high-dimensional time series., Ann. Statist.41 2994-3021. · Zbl 1294.62123
[12] Chen, J., Li, D., Linton, O. andLu, Z. (2017). Semiparametric Ultra-High Dimensional Model Averaging of Nonlinear Dynamic Time Series., Journal of the American Statistical Association In Press.
[13] Cheng, M.-Y., Honda, T., Li, J. andPeng, H. (2014). Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data., Ann. Statist.42 1819-1849. · Zbl 1305.62169
[14] Davidson, J. (1994)., Stochastic Limit Theory, An Introduction for Econometricians. Oxford University Press. · Zbl 0904.60002
[15] Davidson, R. andMacKinnon, J. G. (2004)., Econometric Theory and Methods. Oxford University Press.
[16] Davis, R. A., Holan, S. H., Lund, R. andRavishanker, N., eds. (2016)., Handbook of Discrete-Valued Time Series. CRC Press.
[17] Doukhan, P. (1994)., Mixing: Properties and Examples. Lecture Notes in Statistics 85. Springer-Verlag New York. · Zbl 0801.60027
[18] Fan, J., Feng, Y. andWu, Y. (2010). High-dimensional variable selection for Cox’s proportional hazards model., IMS Collections, Borrowing Strength: Theory Powering Applications - A Festschrift for Lawrence D. Brown 6 70-86.
[19] Fan, J., Feng, Y. andSong, R. (2011). Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models., Journal of the American Statistical Association 106 544-557. · Zbl 1232.62064
[20] Fan, J. andLv, J. (2008). Sure independence screening for ultrahigh dimensional feature space w/ discussion., J.Roy. Statist. Soc. B.70 849-911. · Zbl 1411.62187
[21] Fan, J. andLv, J. (2010). A selective overview of variable selection in high dimensional feature space., Statistica Sinica 20 101-148. · Zbl 1180.62080
[22] Fan, J., Lv, J. andQi, L. (2011). Sparse High dimensional models in economics., Annual Review of Economics 3 291-317.
[23] Fan, J., Ma, J. andDai, W. (2014). Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Varying Coefficient Models., Journal of the American Statistical Association 109 1270-1284. · Zbl 1368.62095
[24] Fan, J. andRen, Y. (2006). Statistical Analysis of DNA Microarray Data in Cancer Research., Clinical Cancer Research 12 4469-4473.
[25] Fan, J. andSong, R. (2010). Sure Independence Screening in generalized linear models with NP-dimensionality., Annals of Statistics 38 3567-3604. · Zbl 1206.68157
[26] Fan, Y. andTang, C. Y. (2013). Tuning Parameter Selection in High-Dimensional Penalized Likelihood., Journal of the Royal Statistical Society: Series B (Statistical Methodology 75 531-552. · Zbl 1411.62216
[27] Gorst-Rasmussen, A. andScheike, T. (2013). Independent screening for single-index hazard rate models with ultrahigh dimensional features., Journal of the Royal Statistical Society: Series B (Statistical Methodology)75 217-245.
[28] Hayashi, F. (2000)., Econometrics. Princeton Univ Press. · Zbl 0994.62107
[29] Huang, J., Ma, S. andZhang, C. H. (2008). Adaptive Lasso for sparse high-dimensional regression models., Statistica Sinica 18 1603-1618. · Zbl 1255.62198
[30] Huang, Q. andZhu, Y. (2016). Model-free sure screening via maximum correlation., Journal of Multivariate Analysis 148 89-106. · Zbl 1383.62112
[31] Johnstone, I. M. andTetterington, M. (2009). Statistical challenges of high dimensional data., Phil. Transactions of the Roy. Soc. A 367 4237-4253. · Zbl 1185.62007
[32] Jurado, K., Ludvigson, S. C. andNg, S. (2015). Measuring Uncertainty., American Economic Review 105 1177-1216.
[33] Kock, A. andCallot, A. (2015). Oracle inequalities for high dimensional vector autoregressions., Journal of Econometrics 186 325-344. · Zbl 1331.62348
[34] Koreisha, S. G. andFang, Y. (2001). Generalized least squares with misspecified serial correlation structures., Journal of the Royal Statistical Society: Series B (Statistical Methodology 63. · Zbl 0989.62048
[35] Li, R., Zhu, L. P. andZhong, W. (2012). Feature Screening via distance correlation., J. Amer. Statist. Assoc.107 1129-1139. · Zbl 1443.62184
[36] Li, G., Peng, H., Jun, Z. andZhu, L. (2012). Robust rank correlation based screening., Annals of Statistics 40 1846-1877. · Zbl 1257.62067
[37] Liu, J., Zhong, W. andLi, R. (2015). A selective overview of feature screening for ultrahigh-dimensional data., Sci. China Math.58 1-22.
[38] Lütkepohl, H. (2005)., New Introduction to Multiple Time Series Analysis. Springer. · Zbl 1072.62075
[39] Medeiros, M. andMendes, E. (2016). L1-regularization of high-dimensional time-series models with non-Gaussian and heteroskedastic errors., Journal of Econometrics 191 255-271. · Zbl 1390.62179
[40] Samorodnitsky, G. (2006). Long Range Dependence., Foundations and Trends in Stochastic systems 1 163-257. · Zbl 1242.60033
[41] Shao, X. andWu, W. B. (2007). Asymptotic spectral theory for nonlinear time series., Ann. Statist.35 1773-1801. · Zbl 1147.62076
[42] Shao, X. andZhang, J. (2014). Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening., Journal of the American Statistical Association 109 1302-1318. · Zbl 1368.62157
[43] Stock, J. H. andWatson, M. W. (1999). Forecasting inflation., Journal of Monetary Economics 44 293-335.
[44] Stock, J. H. andWatson, M. W. (2002). Macroeconomic Forecasting Using Diffusion Indexes., Journal of Business & Economic Statistics 20 147-162.
[45] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., J.Roy. Statist. Soc. B.58 267-288. · Zbl 0850.62538
[46] Wang, H., Li, G. andTsai, C. L. (2007). Regression coefficient and autoregressive order shrinkage and selection via lasso., J.Roy. Statist. Soc. B.69 63-78.
[47] Wang, H., Li, B. andLeng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters., J.Roy. Statist. Soc. B.71 671-683. · Zbl 1250.62036
[48] Worsley, K. J., Liao, C. H., Aston, J., Petre, V., Duncan, G. H., Morales, F. andEvans, A. C. (2002). A General Statistical Analysis for fMRI Data., NeuroImage 15 1-15.
[49] Wu, W. B. (2005). Nonlinear system theory: Another look at dependence., Proceedings of the National Academy of Sciences 102 14150-14154. · Zbl 1135.62075
[50] Wu, W. B. (2011). Asymptotic theory for stationary processes., Statistics and its Interface 4 207-226. · Zbl 05983893
[51] Wu, W. B. andMin, W. (2005). On linear processes with dependent innovations., Stochastic Processes and their Applications 115 939-958. · Zbl 1081.62071
[52] Wu, W. B. andPourahmadi, M. (2009). Banding sample autocovaraince matrices of stationary processes., Statistica Sinica 19 1755-1768. · Zbl 1176.62083
[53] Wu, W. B. andWu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors., Electronic Journal of Statistics 10 352-379. · Zbl 1333.62172
[54] Wu, S., Xue, H., Wu, Y. andWu, H. (2014). Variable Selection for Sparse High-Dimensional Nonlinear Regression Models by Combining Nonnegative Garrote and Sure Independence Screening., Statistica Sinica 24 1365-1387. · Zbl 06431835
[55] Xiao, H. andWu, W. B. (2012). Covariance matrix estimation for stationary time series., Ann. Statist.40 466-493. · Zbl 1246.62191
[56] Xu, P., Zhu, L. andLi, Y. (2014). Ultrahigh dimensional time course feature selection., Biometrics 70 356-365. · Zbl 1419.62482
[57] Zhu, L., Li, L., Li, R. andZhu, L. X. (2011). Model-Free Feature Selection for Ultrahigh Dimensional Data., J. Amer. Statist. Assoc.106 1464-1475. · Zbl 1233.62195
[58] Zou, H. (2006). The adaptive Lasso and its oracle properties., J. Amer. Statist. Assoc.101 1418-1429. · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.