×

zbMATH — the first resource for mathematics

Tail-greedy bottom-up data decompositions and fast multiple change-point detection. (English) Zbl 1454.62109
Summary: This article proposes a “tail-greedy”, bottom-up transform for one-dimensional data, which results in a nonlinear but conditionally orthonormal, multiscale decomposition of the data with respect to an adaptively chosen unbalanced Haar wavelet basis. The “tail-greediness” of the decomposition algorithm, whereby multiple greedy steps are taken in a single pass through the data, both enables fast computation and makes the algorithm applicable in the problem of consistent estimation of the number and locations of multiple change-points in data. The resulting agglomerative change-point detection method avoids the disadvantages of the classical divisive binary segmentation, and offers very good practical performance. It is implemented in the R package breakfast, available from CRAN.

MSC:
62G05 Nonparametric estimation
62G10 Nonparametric hypothesis testing
PDF BibTeX Cite
Full Text: DOI Euclid
References:
[1] Aggarwal, R., Inclan, C. and Leal, R. (1999). Volatility in emerging stock markets. J. Financ. Quant. Anal.34 33–55.
[2] Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol.51 39–54. · Zbl 0658.92010
[3] Bai, J. (1997). Estimating multiple breaks one at a time. Econometric Theory13 315–352.
[4] Bai, J. and Perron, P. (2003). Computation and analysis of multiple structural change models. J. Appl. Econometrics18 1–22.
[5] Birge, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203–268. · Zbl 1037.62001
[6] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Statist.37 157–183. · Zbl 1155.62034
[7] Braun, J., Braun, R. and Mueller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika87 301–314. · Zbl 0963.62067
[8] Braun, J. and Mueller, H.-G. (1998). Statistical methods for DNA sequence segmentation. Statist. Sci.13 142–162. · Zbl 0960.62121
[9] Brodsky, B. and Darkhovsky, B. (1993). Nonparametric Methods in Change-Point Problems. Kluwer, Dordrecht. · Zbl 1274.62512
[10] Chen, K.-M., Cohen, A. and Sackrowitz, H. (2011). Consistent multiple testing for change points. J. Multivariate Anal.102 1339–1343. · Zbl 1221.62109
[11] Cho, H. and Fryzlewicz, P. (2011). Multiscale interpretation of taut string estimation and its connection to Unbalanced Haar wavelets. Stat. Comput.21 671–681. · Zbl 1221.62056
[12] Cho, H. and Fryzlewicz, P. (2012). Multiscale and multilevel technique for consistent segmentation of nonstationary time series. Statist. Sinica22 207–229. · Zbl 1417.62240
[13] Cho, H. and Fryzlewicz, P. (2015). Multiple change-point detection for high-dimensional time series via Sparsified Binary Segmentation. J. R. Stat. Soc. Ser. B. Stat. Methodol.77 475–507.
[14] Choi, F. Y. Y. (2000). Advances in domain independent linear text segmentation. In NAACL 2000 Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference 26–33.
[15] Chu, P.-S. and Zhao, X. (2004). Bayesian change-point analysis of tropical cyclone activity: The central north Pacific case. J. Climate17 4893–4901.
[16] Ciuperca, G. (2011). A general criterion to determine the number of change-points. Statist. Probab. Lett.81 1267–1275. · Zbl 1219.62110
[17] Ciuperca, G. (2014). Model selection by LASSO methods in a change-point model. Statist. Papers55 349–374. · Zbl 1297.62162
[18] Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist.29 1–48. · Zbl 1029.62038
[19] Davis, R., Lee, T. and Rodriguez-Yam, G. (2006). Structural break estimation for nonstationary time series models. J. Amer. Statist. Assoc.101 223–239. · Zbl 1118.62359
[20] Desmond, R., Weiss, H., Arani, R., Soong, S.-J., Wood, M., Fiddian, P., Gnann, J. and Whitley, R. (2002). Clinical applications for change-point analysis of herpes zoster pain. Journal of Pain and Symptom Management23 510–516.
[21] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika81 425–455. · Zbl 0815.62019
[22] Du, C., Kao, C.-L. and Kou, S. (2016). Stepwise signal extraction via marginal likelihood. J. Amer. Statist. Assoc.111 314–330.
[23] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist.32 407–499. · Zbl 1091.62054
[24] Eichinger, B. and Kirch, C. (2018). A MOSUM procedure for the estimation of multiple random change points. Bernoulli24 526–564. · Zbl 1388.62251
[25] Erdman, C. and Emerson, J. (2008). A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics24 2143–2148.
[26] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change-point inference (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol.76 495–580.
[27] Fryzlewicz, P. (2007). Unbalanced Haar technique for nonparametric function estimation. J. Amer. Statist. Assoc.102 1318–1327. · Zbl 1333.62014
[28] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Ann. Statist.42 2243–2281. · Zbl 1302.62075
[29] Fryzlewicz, P. (2018). Supplement to “Tail-greedy bottom-up data decompositions and fast multiple change-point detection”. DOI:10.1214/17-AOS1662SUPP. · Zbl 1454.62109
[30] Fryzlewicz, P. and Subba Rao, S. (2014). Multiple-change-point detection for auto-regressive conditional heteroscedastic processes. J. R. Stat. Soc. Ser. B. Stat. Methodol.76 903–924.
[31] Fryzlewicz, P. and Timmermans, C. (2016). SHAH: SHape-Adaptive Haar wavelets for image processing. J. Comput. Graph. Statist.25 879–898.
[32] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc.105 1480–1493. · Zbl 1388.62211
[33] Huskova, M. and Slaby, A. (2001). Permutation tests for multiple changes. Kybernetika (Prague) 37 605–622. · Zbl 1264.62038
[34] Jackson, B., Sargle, J., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L. and Tsai, T. T. (2005). An algorithm for optimal partitioning of data on an interval. IEEE Signal Process. Lett.12 105–108.
[35] Jansen, M., Nason, G. and Silverman, B. (2009). Multiscale methods for data on graphs and irregular multidimensional situations. J. R. Stat. Soc. Ser. B. Stat. Methodol.71 97–125. · Zbl 1231.62054
[36] Killick, R., Fearnhead, P. and Eckley, I. (2012). Optimal detection of changepoints with a linear computational cost. J. Amer. Statist. Assoc.107 1590–1598. · Zbl 1258.62091
[37] Koprinska, I. and Carrato, S. (2001). Temporal video segmentation: A survey. Signal Process. Image Commun.16 477–500.
[38] Lavielle, M. (1999). Detection of multiple changes in a sequence of dependent variables. Stochastic Process. Appl.83 79–102. · Zbl 0991.62014
[39] Lavielle, M. (2005). Using penalized contrasts for the change-point problem. Signal Process.85 1501–1510. · Zbl 1160.94341
[40] Lavielle, M. and Moulines, E. (2000). Least-squares estimation of an unknown number of shifts in a time series. J. Time Series Anal.21 33–59. · Zbl 0974.62070
[41] Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Process.85 717–736. · Zbl 1148.94403
[42] Lee, C.-B. (1995). Estimating the number of change points in a sequence of independent normal random variables. Statist. Probab. Lett.25 241–248. · Zbl 0839.62015
[43] Li, H., Munk, A. and Sieling, H. (2016). FDR-control in multiscale change-point segmentation. Electron. J. Stat.10 918–959. · Zbl 1338.62117
[44] Lio, P. and Vanucci, M. (2000). Wavelet change-point prediction of transmembrane proteins. Bioinformatics16 376–382.
[45] Lu, L., Zhang, H.-J. and Jiang, H. (2002). Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process.10 504–516.
[46] Mahmoud, M., Parker, P., Woodall, W. and Hawkins, D. (2007). A change point method for linear profile data. Qual. Reliab. Eng. Int.23 247–268.
[47] Maidstone, R., Hocking, T., Rigaill, G. and Fearnhead, P. (2017). On optimal multiple changepoint algorithms for large data. Stat. Comput.27 519–533. · Zbl 06697671
[48] Matteson, D. and James, N. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Amer. Statist. Assoc.109 334–345. · Zbl 1367.62260
[49] Minin, V., Dorman, K., Fang, F. and Suchard, M. (2005). Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics21 3034–3042.
[50] Olshen, A., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostat.5 557–572. · Zbl 1155.62478
[51] Pan, J. and Chen, J. (2006). Application of modified information criterion to multiple change point problems. J. Multivariate Anal.97 2221–2241. · Zbl 1101.62050
[52] Rigaill, G. (2015). A pruned dynamic programming algorithm to recover the best segmentations with 1 to \({K}_{\max}\) change-points. J. SFdS156 180–205. · Zbl 1381.90094
[53] Rinaldo, A. (2009). Properties and refinements of the fused lasso. Ann. Statist.37 2922–2952. · Zbl 1173.62027
[54] Robbins, M., Gallagher, C., Lund, R. B. and Aue, A. (2011). Mean shift testing in correlated data. J. Time Series Anal.32 498–511. · Zbl 1294.62212
[55] Schroeder, A. L. and Fryzlewicz, P. (2013). Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery. Stat. Interface6 449–461. · Zbl 1326.91035
[56] Shriberga, E., Stolckea, A., Hakkani-Türb, D. and Türb, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun.32 127–154.
[57] Tartakovsky, A., Rozovskii, B., Blazek, R. and Kim, H. (2006). A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Trans. Signal Process.54 3372–3382. · Zbl 1373.68144
[58] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol.67 91–108. · Zbl 1060.62049
[59] Venkatraman, E. S. (1992). Consistency results in multiple change-point problems. Technical Report 24, Dept. Statistics, Stanford Univ. Available from https://statistics.stanford.edu/resources/technical-reports.
[60] Venkatraman, E. S. and Olshen, A. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics23 657–663.
[61] Vostrikova, L. (1981). Detecting ‘disorder’ in multidimensional random processes. Sov. Math., Dokl.24 55–59. · Zbl 0487.62072
[62] Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika82 385–397. · Zbl 0824.62031
[63] Wang, H., Zhang, D. and Shin, K. (2004). Change-point monitoring for the detection of DoS attacks. IEEE Trans. Dependable Secure Comput.1 193–208.
[64] Wu, Y. (2008). Simultaneous change point analysis and variable selection in a regression problem. J. Multivariate Anal.99 2154–2171. · Zbl 1169.62064
[65] Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statist. Probab. Lett.6 181–189. · Zbl 0642.62016
[66] Yao, Y.-C. and Au, S. T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A51 370–381. · Zbl 0711.62031
[67] Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics63 22–32. · Zbl 1206.62174
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.