×

zbMATH — the first resource for mathematics

Wild binary segmentation for multiple change-point detection. (English) Zbl 1302.62075
Summary: We propose a new technique, called wild binary segmentation (WBS), for consistent estimation of the number and locations of multiple change-points in data. We assume that the number of change-points can increase to infinity with the sample size. Due to a certain random localisation mechanism, WBS works even for very short spacings between the change-points and/or very small jump magnitudes, unlike standard binary segmentation. On the other hand, despite its use of localisation, WBS does not require the choice of a window or span parameter, and does not lead to a significant increase in computational complexity. WBS is also easy to code. We propose two stopping criteria for WBS: one based on thresholding and the other based on what we term the ‘strengthened Schwarz information criterion’. We provide default recommended values of the parameters of the procedure and show that it offers very good practical performance in comparison with the state of the art. The WBS methodology is implemented in the R package wbs, available on CRAN. {
} In addition, we provide a new proof of consistency of binary segmentation with improved rates of convergence, as well as a corresponding result for WBS.

MSC:
62G05 Nonparametric estimation
Software:
CRAN; unbalhaar; not; basta; R; ftnonpar; wbs
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Antoch, J. and Jarušková, D. (2013). Testing for multiple change points. Comput. Statist. 28 2161-2183. · Zbl 1306.65022
[2] Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol. 51 39-54. · Zbl 0658.92010
[3] Bai, J. (1997). Estimating multiple breaks one at a time. Econometric Theory 13 315-352. · Zbl 04539987
[4] Bai, J. and Perron, P. (2003). Computation and analysis of multiple structural change models. J. Appl. Econometrics 18 1-22.
[5] Baranowski, R. and Fryzlewicz, P. (2014). wbs: Wild Binary Segmentation for multiple change-point detection, 2014. R package version 1.1. · Zbl 1302.62075
[6] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. ( JEMS ) 3 203-268. · Zbl 1037.62001
[7] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Statist. 37 157-183. · Zbl 1155.62034
[8] Braun, J. and Mueller, H.-G. (1998). Statistical methods for DNA sequence segmentation. Statist. Sci. 13 142-162. · Zbl 0960.62121
[9] Brodsky, B. E. and Darkhovsky, B. S. (1993). Nonparametric Methods in Change-Point Problems. Mathematics and Its Applications 243 . Kluwer Academic, Dordrecht. · Zbl 1274.62512
[10] Chan, H. P. and Walther, G. (2013). Detection with the scan and the average likelihood ratio. Statist. Sinica 23 409-428. · Zbl 1257.62096
[11] Chen, K.-M., Cohen, A. and Sackrowitz, H. (2011). Consistent multiple testing for change points. J. Multivariate Anal. 102 1339-1343. · Zbl 1221.62109
[12] Cho, H. and Fryzlewicz, P. (2014). Multiple change-point detection for high-dimensional time series via sparsified binary segmentation. J. R. Stat. Soc. Ser. B Stat. Methodol. · Zbl 1302.62075
[13] Cho, H. and Fryzlewicz, P. (2011). Multiscale interpretation of taut string estimation and its connection to unbalanced Haar wavelets. Stat. Comput. 21 671-681. · Zbl 1221.62056
[14] Cho, H. and Fryzlewicz, P. (2012). Multiscale and multilevel technique for consistent segmentation of nonstationary time series. Statist. Sinica 22 207-229. · Zbl 1417.62240
[15] Ciuperca, G. (2011). A general criterion to determine the number of change-points. Statist. Probab. Lett. 81 1267-1275. · Zbl 1219.62110
[16] Ciuperca, G. (2014). Model selection by LASSO methods in a change-point model. Statist. Papers 55 349-374. · Zbl 1297.62162
[17] Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29 1-65. · Zbl 1029.62038
[18] Davis, R. A., Lee, T. C. M. and Rodriguez-Yam, G. A. (2006). Structural break estimation for nonstationary time series models. J. Amer. Statist. Assoc. 101 223-239. · Zbl 1118.62359
[19] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124-152. · Zbl 1029.62070
[20] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054
[21] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change-point inference (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 76 495-580.
[22] Fryzlewicz, P. and Subba Rao, S. (2014). Multiple-change-point detection for auto-regressive conditional heteroscedastic processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 903-924.
[23] Fryzlewicz, P. (2007). Unbalanced Haar technique for nonparametric function estimation. J. Amer. Statist. Assoc. 102 1318-1327. · Zbl 1333.62014
[24] Fryzlewicz, P. (2012). Time-threshold maps: Using information from wavelet reconstructions with all threshold values simultaneously (with discussion). J. Korean Statist. Soc. 41 145-159. · Zbl 1296.62070
[25] Fryzlewicz, P. (2014). Discussion contribution to ‘Multiscale change-point inference’ by Frick, Munk and Sieling. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 547-548.
[26] Halko, N., Martinsson, P. G. and Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53 217-288. · Zbl 1269.65043
[27] Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383-393. · Zbl 0305.62031
[28] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc. 105 1480-1493. · Zbl 1388.62211
[29] Hušková, M. and Slabý, A. (2001). Permutation tests for multiple changes. Kybernetika ( Prague ) 37 605-622. · Zbl 1264.62038
[30] Jackson, B., Sargle, J., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L. and Tsai, T. T. (2005). An algorithm for optimal partitioning of data on an interval. IEEE Signal Processing Letters 12 105-108.
[31] Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc. 107 1590-1598. · Zbl 1258.62091
[32] Killick, R., Nam, C., Aston, J. and Eckley, I. (2012). changepoint.info: The changepoint repository. Available at .
[33] Kirch, C. and Muhsal, B. (2014). A MOSUM procedure for the estimation of multiple random change points. · Zbl 1305.62314
[34] Korostelëv, A. P. (1987). Minimax estimation of a discontinuous signal. Theory Probab. Appl. 32 727-730. · Zbl 0659.62103
[35] Lavielle, M. (1999). Detection of multiple changes in a sequence of dependent variables. Stochastic Process. Appl. 83 79-102. · Zbl 0991.62014
[36] Lavielle, M. (2005). Using penalized contrasts for the change-point problem. Signal Processing 85 1501-1510. · Zbl 1160.94341
[37] Lavielle, M. and Moulines, E. (2000). Least-squares estimation of an unknown number of shifts in a time series. J. Time Series Anal. 21 33-59. · Zbl 0974.62070
[38] Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Processing 85 717-736. · Zbl 1148.94403
[39] Lee, C.-B. (1995). Estimating the number of change points in a sequence of independent normal random variables. Statist. Probab. Lett. 25 241-248. · Zbl 0839.62015
[40] Mahoney, M. (2010). Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning 3 123-224. · Zbl 1232.68173
[41] Matteson, D. S. and James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Amer. Statist. Assoc. 109 334-345. · Zbl 1367.62260
[42] Muggeo, V. and Adelfio, G. (2011). Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics 27 161-166.
[43] Olshen, A., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. · Zbl 1155.62478
[44] Pan, J. and Chen, J. (2006). Application of modified information criterion to multiple change point problems. J. Multivariate Anal. 97 2221-2241. · Zbl 1101.62050
[45] Rigaill, G. (2010). Pruned dynamic programming for optimal multiple change-point detection.
[46] Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist. 37 2922-2952. · Zbl 1173.62027
[47] Rojas, C. and Wahlberg, B. (2014). On change point detection using the fused lasso method.
[48] Taylor, J. E., Worsley, K. J. and Gosselin, F. (2007). Maxima of discretely sampled random fields, with an application to ‘bubbles’. Biometrika 94 1-18. · Zbl 1143.62059
[49] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91-108. · Zbl 1060.62049
[50] Venkatraman, E. S. (1992). Consistency results in multiple change-point problems. Ph.D. thesis, Stanford Univ., ProQuest LLC, Ann Arbor, MI.
[51] Vostrikova, L. (1981). Detecting ‘disorder’ in multidimensional random processes. Soviet Math. Dokl. 24 55-59. · Zbl 0487.62072
[52] Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika 82 385-397. · Zbl 0824.62031
[53] Wu, Y. (2008). Simultaneous change point analysis and variable selection in a regression problem. J. Multivariate Anal. 99 2154-2171. · Zbl 1169.62064
[54] Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statist. Probab. Lett. 6 181-189. · Zbl 0642.62016
[55] Yao, Y.-C. and Au, S. T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A 51 370-381. · Zbl 0711.62031
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.