×

A robust bootstrap change point test for high-dimensional location parameter. (English) Zbl 1493.62148

Summary: We consider the problem of change point detection for high-dimensional distributions in a location family when the dimension can be much larger than the sample size. In change point analysis, the widely used cumulative sum (CUSUM) statistics are sensitive to outliers and heavy-tailed distributions. In this paper, we propose a robust, tuning-free (i.e., fully data-dependent), and easy-to-implement change point test that enjoys strong theoretical guarantees. To achieve the robust purpose in a nonparametric setting, we formulate the change point detection in the multivariate \(U\)-statistics framework with anti-symmetric and nonlinear kernels. Specifically, the within-sample noise is canceled out by anti-symmetry of the kernel, while the signal distortion under certain nonlinear kernels can be controlled such that the between-sample change point signal is magnitude preserving. A (half) jackknife multiplier bootstrap (JMB) tailored to the change point detection setting is proposed to calibrate the distribution of our \(\ell^\infty\)-norm aggregated test statistic. Subject to mild moment conditions on kernels, we derive the uniform rates of convergence for the JMB to approximate the sampling distribution of the test statistic, and analyze its size and power properties. Extensions to multiple change point testing and estimation are discussed with illustration from numerical studies.

MSC:

62F40 Bootstrap, jackknife and other resampling methods
62G35 Nonparametric robustness
62E17 Approximations to statistical distributions (nonasymptotic)
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electronic Journal of Probability 13 1000-1034. · Zbl 1190.60010
[2] Arlot, S., Celisse, A. and Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Journal of Machine Learning Research 20 1-56. · Zbl 1446.62120
[3] Aston, J. A. and Kirch, C. (2012). Detecting and estimating changes in dependent functional data. Journal of Multivariate Analysis 109 204-220. · Zbl 1241.62121
[4] Aston, J. A., Kirch, C. et al. (2012). Evaluating stationarity via change-point alternatives with applications to fMRI data. The Annals of Applied Statistics 6 1906-1948. · Zbl 1257.62072
[5] Aston, J. A., Kirch, C. et al. (2018). High dimensional efficiency with applications to change point tests. Electronic Journal of Statistics 12 1901-1947. · Zbl 1392.62059
[6] Aue, A., Gabrys, R., Horváth, L. and Kokoszka, P. (2009). Estimation of a change-point in the mean function of functional data. Journal of Multivariate Analysis 100 2254-2269. · Zbl 1176.62025
[7] Aue, A., Hörmann, S., Horváth, L., Reimherr, M. et al. (2009). Break detection in the covariance structure of multivariate time series models. The Annals of Statistics 37 4046-4087. · Zbl 1191.62143
[8] Bai, J. (2010). Common breaks in means and variances for panel data. Journal of Econometrics 157 78-92. · Zbl 1431.62353
[9] Barigozzi, M., Cho, H. and Fryzlewicz, P. (2018). Simultaneous multiple change-point and factor analysis for high-dimensional time series. Journal of Econometrics 206 187-225. · Zbl 1398.62221
[10] Bhattacharjee, M., Banerjee, M. and Michailidis, G. (2019). Change Point Estimation in Panel Data with Temporal and Cross-sectional Dependence. arXiv preprint arXiv:1904.11101.
[11] Brault, V., Ouadah, S., Sansonnet, L. and Lévy-Leduc, C. (2018). Nonparametric multiple change-point estimation for analyzing large Hi-C data matrices. Journal of Multivariate Analysis 165 143-165. · Zbl 1397.62186
[12] Chen, L., Wang, W. and Wu, W. (2019). Inference of Break-Points in High-Dimensional Time Series. Available at SSRN 3378221.
[13] Chen, X. (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. The Annals of Statistics 46 642-678. · Zbl 1396.62019
[14] Chen, X. and Kato, K. (2019). Randomized incomplete \(U\)-statistics in high dimensions. The Annals of Statistics 47 3127-3156. · Zbl 1435.62075
[15] Chen, X. and Kato, K. (2020). Jackknife multiplier bootstrap: finite sample approximations to the \(U\)-process supremum with applications. Probability Theory and Related Fields 176 1097-1163. · Zbl 1439.62067
[16] Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Probab. Theory Related Fields 162 47-70. · Zbl 1319.60072
[17] Chernozhukov, V., Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Annals of Probability 45 2309-2352. · Zbl 1377.60040
[18] Chernozhukov, V., Chetverikov, D. and Koike, Y. (2020). Nearly optimal central limit theorem and bootstrap approximations in high dimensions. arXiv:2012.09513.
[19] Cho, H. (2016). Change-point detection in panel data via double CUSUM statistic. Electronic Journal of Statistics 10 2000-2038. · Zbl 1397.62301
[20] Cho, H. and Fryzlewicz, P. (2015). Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. Journal of the Royal Statistical Society: Series B 77 475-507. · Zbl 1414.62356
[21] Csörgő, M. and Horváth, L. (1997). Limit Theorems in Change-Point Analysis. New York: Wiley. · Zbl 0884.62023
[22] Csörgo, M. and Horváth, L. (1988). Invariance principles for changepoint problems. Journal of Multivariate Analysis 27 151-168. · Zbl 0656.62031
[23] de la Peña, V. and Giné, E. (1999). Decoupling: From Dependence to Independence. Springer.
[24] Dette, H., Pan, G. and Yang, Q. (2018). Estimating a change point in a sequence of very high-dimensional covariance matrices. arXiv:1807.10797.
[25] Enikeeva, F. and Harchaoui, Z. (2019). High-dimensional change-point detection under sparse alternatives. The Annals of Statistics 47 2051-2079. · Zbl 1427.62036
[26] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. The Annals of Statistics 42 2243-2281. · Zbl 1302.62075
[27] Gombay, E. (2001). U-statistics for change under alternatives. Journal of Multivariate Analysis 78 139-158. · Zbl 1009.62035
[28] Gombay, E. and Horváth, L. (1995). An application of \(U\)-statistics to change-point analysis. Acta Scientiarum Mathematicarum 60 345-358. · Zbl 0832.62039
[29] Gombay, E. and Horváth, L. (2002). Rates of convergence for \(U\)-statistic processes and their bootstrapped versions. Journal of Statistical Planning and Inference 102 247-272. · Zbl 1005.62046
[30] Hawkins, D. M. and Deng, Q. (2010). A Nonparametric Change-Point Control Chart. Journal of Quality Technology 42 165-173. · doi:10.1080/00224065.2010.11917814
[31] Hodges, J. L. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Annals of Mathematical Statistics 34 598-611. · Zbl 0203.21105
[32] Holmes, M., Kojadinovic, I. and Quessy, J.-F. (2013). Nonparametric tests for change-point detection à la Gombay and Horváth. Journal of Multivariate Analysis 115 16-32. · Zbl 1294.62126
[33] Horváth, L. (1993). The maximum likelihood method for testing changes in the parameters of normal observations. The Annals of Statistics 21 671-680. · Zbl 0778.62016
[34] Horváth, L. and Hušková, M. (2012). Change-point detection in panel data. Journal of Time Series Analysis 33 631-648. · Zbl 1282.62181
[35] Horváth, L., Kokoszka, P. and Steinebach, J. (1999). Testing for changes in multivariate dependent observations with an application to temperature changes. Journal of Multivariate Analysis 68 96-119. · Zbl 0962.62042
[36] Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification 2 193-218.
[37] James, N. A. and Matteson, D. S. (2015). ecp: An R package for nonparametric multiple change point analysis of multivariate data. Journal of Statistical Software 62 1-25.
[38] Jirak, M. (2015). Uniform change point tests in high dimension. The Annals of Statistics 43 2451-2483. · Zbl 1327.62467
[39] Killick, R. and Eckley, I. (2014). changepoint: An R package for changepoint analysis. Journal of statistical software 58 1-19.
[40] Kirch, C. and Stoehr, C. (2019). Sequential change point tests based on \(U\)-statistics.
[41] Ledoux, M. and Talagrand, M. (1991). Probability in Banach spaces: isoperimetry and processes. New York: Springer-Verlag. · Zbl 0748.60004
[42] Lee, S., Liao, Y., Seo, M. H. and Shin, Y. (2018). Oracle estimation of a change point in high-dimensional quantile regression. Journal of the American Statistical Association 113 1184-1194. · Zbl 1402.62033
[43] Lee, S., Seo, M. H. and Shin, Y. (2016). The lasso for high dimensional regression with a possible change point. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 193-210. · Zbl 1411.62205
[44] Liu, H., Gao, C. and Samworth, R. J. (2021). Minimax rates in sparse, high-dimensional change point detection. The Annals of Statistics 49 1081-1112. · Zbl 1472.62013
[45] Minami, K. (2020). Estimating piecewise monotone signals. Electronic Journal of Statistics 14 1508-1576. · Zbl 1440.62308 · doi:10.1214/20-ejs1700
[46] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Statistics. · Zbl 0556.62028
[47] Niu, Y. S., Hao, N. and Zhang, H. (2016). Multiple change-point detection: A selective overview. Statistical Science 31 611-623. · Zbl 1442.62170
[48] Padilla, O. H. M., Yu, Y., Wang, D. and Rinaldo, A. (2019). Optimal nonparametric change point detection and localization. arXiv:1905.10019.
[49] Pettitt, A. N. (1979). A Non-Parametric Approach to the Change-Point Problem. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28 126-135. · Zbl 0438.62037
[50] Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66 846-850.
[51] Robbins, M., Gallagher, C., Lund, R. and Aue, A. (2011). Mean shift testing in correlated data. Journal of Time Series Analysis 32 498-511. · Zbl 1294.62212
[52] Rudelson, M., Vershynin, R. et al. (2013). Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability 18. · Zbl 1329.60056
[53] van der Vaart, A. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. · Zbl 0910.62001
[54] van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes: with applications to statistics. New York: Springer. · Zbl 0862.60002
[55] Vogel, D. and Wendler, M. (2017). Studentized U-quantile processes under dependence with applications to change-point analysis. Bernoulli 23 3114-3144. · Zbl 1401.62046
[56] Wang, R., Volgushev, S. and Shao, X. (2019). Inference for Change Points in High Dimensional Data. arXiv:1905.08446.
[57] Wang, T. and Samworth, R. J. (2018). High dimensional change point estimation via sparse projection. Journal of Royal Statistical Society: Series B (Statistical Methodology) 80 57-83. · Zbl 1439.62199
[58] Wang, Y., Wu, C., Ji, Z., Wang, B. and Liang, Y. (2011). Non-parametric change-point method for differential gene expression detection. PloS one 6 e20060.
[59] Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. In 2013 Information Theory and Applications Workshop (ITA) 1-20. IEEE. · Zbl 1267.62084
[60] Yau, C. Y. and Zhao, Z. (2016). Inference for multiple change points in time series via likelihood ratio scan statistics. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 895-916. · Zbl 1414.62386
[61] Yu, M. and Chen, X. (2021). Finite sample change point inference and identification for high-dimensional mean vectors. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 83 247-270. · Zbl 07555264
[62] Zhong, P.-S. and Li, J. (2016). Test for Temporal Homogeneity of Means in High-dimensional Longitudinal Data. arXiv:1608.07482.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.