×

FDR-control in multiscale change-point segmentation. (English) Zbl 1338.62117

Summary: Fast multiple change-point segmentation methods, which additionally provide faithful statistical statements on the number, locations and sizes of the segments, have recently received great attention. In this paper, we propose a multiscale segmentation method, FDRSeg, which controls the false discovery rate (FDR) in the sense that the number of false jumps is bounded linearly by the number of true jumps. In this way, it adapts the detection power to the number of true jumps. We prove a non-asymptotic upper bound for its FDR in a Gaussian setting, which allows to calibrate the only parameter of FDRSeg properly. Moreover, we show that FDRSeg estimates change-point locations, as well as the signal, in a uniform sense at optimal minimax convergence rates up to a log-factor. The latter is w.r.t. \(L^{p}\)-risk, \(p\geq 1\), over classes of step functions with bounded jump sizes and either bounded, or even increasing, number of change-points. FDRSeg can be efficiently computed by an accelerated dynamic program; its computational complexity is shown to be linear in the number of observations when there are many change-points. The performance of the proposed method is examined by comparisons with some state of the art methods on both simulated and real datasets. An R-package is available online.

MSC:

62G08 Nonparametric regression and quantile regression
62G10 Nonparametric hypothesis testing
62G20 Asymptotic properties of nonparametric inference
90C39 Dynamic programming

References:

[1] Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods., Bull. Math. Biol. 51 39-54. · Zbl 0658.92010 · doi:10.1007/BF02458835
[2] Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural changes., Econometrica 66 47-78. · Zbl 1056.62523 · doi:10.2307/2998540
[3] Bellman, R. (1957)., Dynamic Programming . Princeton University Press, Princeton, NJ, USA. · Zbl 0077.13605
[4] Bellman, R. E. and Dreyfus, S. E. (1962)., Applied Dynamic Programming . Princeton, NJ: Princeton University Press. · Zbl 0106.34901
[5] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing., J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289-300. · Zbl 0809.62014
[6] Benjamini, Y. and Yekutieli, D. (2001). The Control of the False Discovery Rate in Multiple Testing under Dependency., Ann. Statist. 29 1165-88. · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[7] Birgé, L. and Massart, P. (2006). Minimal penalties for Gaussian model selection., Probab. Theory Related Fields 138 33-73. · Zbl 1112.62082 · doi:10.1007/s00440-006-0011-8
[8] Blythe, D. A. J., von Bunau, P., Meinecke, F. C. and Muller, K. (2012). Feature Extraction for Change-Point Detection Using Stationary Subspace Analysis., IEEE Trans. Neur. Netwrks Learn. Syst. 23 631-643.
[9] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators., Ann. Statist. 37 157-183. · Zbl 1155.62034 · doi:10.1214/07-AOS558
[10] Braun, J. V., Braun, R. K. and Mueller, H. G. (2000). Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation., Biometrika 87 301-314. · Zbl 0963.62067 · doi:10.1093/biomet/87.2.301
[11] Carlstein, E., Müller, H.-G. and Siegmund, D. (1994)., Change-point problems . Institute of Mathematical Statistics Lecture Notes-Monograph Series, 23 . Institute of Mathematical Statistics, Hayward, CA. Papers from the AMS-IMS-SIAM Summer Research Conference held at Mt. Holyoke College, South Hadley, MA, July 11-16, 1992.
[12] Chan, H. P. and Walther, G. (2013). Detection with the scan and the average likelihood ratio., Statist. Sinica 23 409-428. · Zbl 1257.62096 · doi:10.5705/ss.2011.169
[13] Chen, Y., Shah, R. D. and Samworth, R. J. (2014). Discussion of “Multiscale change-point inference”., J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 544-546.
[14] Cheng, D. and Schwartzman, A. (2015). Multiple testing of local extrema for detection of change points., .
[15] Chung, S.-H., Andersen, O. S. and Krishnamurthy, V. (2007)., Biological Membrane Ion Channels . Springer.
[16] Cormen, T. H., Leiserson, C. E., Rivest, R. L. and Stein, C. (2009)., Introduction to Algorithms , third ed. MIT press. · Zbl 1187.68679
[17] Csörgö, M. and Horváth, L. (1997)., Limit Theorems in Change-point Analysis . John Wiley & Sons Ltd., Chichester. · Zbl 0884.62023
[18] Davies, L., Höhenrieder, C. and Krämer, W. (2012). Recursive computation of piecewise constant volatilities., Comput. Stat. Data Anal. 56 3623-3631. · Zbl 1254.91751 · doi:10.1016/j.csda.2010.06.027
[19] Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution., Ann. Statist. 29 1-65. With discussion and rejoinder by the authors. · Zbl 1029.62038 · doi:10.1214/aos/996986501
[20] Dette, H., Munk, A. and Wagner, T. (1998). Estimating the variance in nonparametric regression-what is a reasonable choice?, J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 751-764. · Zbl 0944.62041 · doi:10.1111/1467-9868.00152
[21] Diskin, S. J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H., Bucan, M., Maris, J. M. and Wang, K. (2008). Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms., Nucleic Acids Res. 36 e126.
[22] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage., Biometrika 81 425-455. · Zbl 0815.62019 · doi:10.1093/biomet/81.3.425
[23] Du, C., Kao, C.-L. M. and Kou, S. C. (2015). Stepwise Signal Extraction via Marginal Likelihood., J. Amer. Statist. Assoc. · doi:10.1080/01621459.2015.1006365
[24] Dümbgen, L. and Kovac, A. (2009). Extensions of smoothing via taut strings., Electron. J. Stat. 3 41-75. · Zbl 1326.62087 · doi:10.1214/08-EJS216
[25] Dümbgen, L., Piterbarg, V. I. and Zholud, D. (2006). On the limit distribution of multiscale test statistics for nonparametric curve estimation., Math. Methods Statist. 15 20-25.
[26] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses., Ann. Statist. 29 124-152. · Zbl 1029.62070 · doi:10.1214/aos/996986504
[27] Dümbgen, L. and Walther, G. (2008). Multiscale inference about a density., Ann. Statist. 36 1758-1785. · Zbl 1142.62336 · doi:10.1214/07-AOS521
[28] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model., Statist. Sci., with discussion and rejoinder by the authors 23 1-22. · Zbl 1327.62046 · doi:10.1214/07-STS236
[29] Efron, B. and Zhang, N. R. (2011). False discovery rates and copy number variation., Biometrika 98 251-271. · Zbl 1215.62115 · doi:10.1093/biomet/asr018
[30] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale Change-point Inference., J. R. Stat. Soc. Ser. B. Stat. Methodol., with discussion and rejoinder by the authors 76 495-580. · doi:10.1111/rssb.12047
[31] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization., Ann. Appl. Statist. 1 302-332. · Zbl 1378.90064 · doi:10.1214/07-AOAS131
[32] Friedrich, F., Kempe, A., Liebscher, V. and Winkler, G. (2008). Complexity penalized M-estimation: fast computation., J. Comput. Graph. Statist. 17 201-224. · doi:10.1198/106186008X285591
[33] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection., Ann. Statist. 42 2243-2281. · Zbl 1302.62075 · doi:10.1214/14-AOS1245
[34] Guo, W. and Sarkar, S. (2013). Adaptive Controls of FWER and FDR Under Block Dependence., Unpublished manuscript . .
[35] Hall, P., Kay, J. W. and Titterinton, D. M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression., Biometrika 77 521-528. · Zbl 1377.62102 · doi:10.1093/biomet/77.3.521
[36] Hao, N., Niu, Y. and Zhang, H. (2013). Multiple change-point detection via a screening and ranking algorithm., Statist. Sinica 23 1553-1572. · Zbl 1417.62236
[37] Harchaoui, Z. and Lévy-Leduc, C. (2008). Catching change-points with lasso., Adv. in Neur. Inform. Processing Syst. 20 161-168.
[38] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty., J. Amer. Statist. Assoc. 105 1480-1493. · Zbl 1388.62211 · doi:10.1198/jasa.2010.tm09181
[39] Hille, B. (2001)., Ion Channels of Excitable Membranes . Sinauer Sunderland, MA.
[40] Hotz, T., Schütte, O. M., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C. and Munk, A. (2013). Idealizing ion channel recordings by jump segmentation and statistical multiresolution analysis., IEEE Trans. Nanobiosci. 12 376-386.
[41] Inclán, C. and Tiao, G. C. (1994). Use of Cumulative Sums of Squares for Retrospective Detection of Changes of Variance., J. Amer. Statist. Assoc. 89 913-923. · Zbl 0825.62678 · doi:10.2307/2290916
[42] Jackson, B., Sargle, J. D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L. and Tsai, T. T. (2005). An algorithm for optimal partitioning of data on an interval., IEEE Signal Process. Lett. 12 105-108.
[43] Jeng, X. J., Cai, T. T. and Li, H. (2010). Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis., J. Amer. Statist. Assoc. 105 1156-1166. · Zbl 1390.62170 · doi:10.1198/jasa.2010.tm10083
[44] Kass, R. S. et al. (2005). The channelopathies: novel insights into molecular and genetic mechanisms of human disease., J. Clin. Invest. 115 1986-1989.
[45] Killick, R., Fearnhead, P. and Eckley, I. A. (2012). Optimal Detection of Changepoints With a Linear Computational Cost., J. Amer. Statist. Assoc. 107 1590-1598. · Zbl 1258.62091 · doi:10.1080/01621459.2012.737745
[46] Lavielle, M. and Teyssière, G. (2007). Adaptive detection of multiple change-points in asset price volatility. In, Long memory in economics 129-156. Springer, Berlin. · Zbl 1181.91348 · doi:10.1007/978-3-540-34625-8_5
[47] Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations., Ann. Appl. Statist. 6 1306-1326. · Zbl 1401.92145 · doi:10.1214/12-AOAS539
[48] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data., Biostatistics 5 557-572. · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008
[49] Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., Collins, C., Kuo, W.-L., Chen, C., Zhai, Y. et al. (1998). High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays., Nat. Genet. 20 207-211.
[50] Rice, J. (1984). Bandwidth choice for nonparametric regression., Ann. Statist. 12 1215-1230. · Zbl 0554.62035 · doi:10.1214/aos/1176346788
[51] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics., Scand. J. Stat. 40 752-769. · Zbl 1283.62179 · doi:10.1111/sjos.12027
[52] Rosenberg, A. and Hirschberg, J. (2007). V-measure: a conditional entropy-based external cluster evaluation measures., Proc. Conf. Empirical Methods Natural Lang. Process. June 410-420.
[53] Scott, A. J. and Knott, M. (1974). A Cluster Analysis Method for Grouping Means in the Analysis of Variance., Biometrics 30 pp. 507-512. · Zbl 0284.62044 · doi:10.2307/2529204
[54] Siegmund, D. (2013). Change-points: from sequential detection to biology and back., Sequent. Anal. 32 2-14. · Zbl 1271.62191 · doi:10.1080/07474946.2013.751834
[55] Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics., Bernoulli 6 191-213. · Zbl 0976.62048 · doi:10.2307/3318574
[56] Siegmund, D. O., Zhang, N. R. and Yakir, B. (2011). False discovery rate for scanning statistics., Biometrika 98 979-985. · Zbl 1228.62090 · doi:10.1093/biomet/asr057
[57] Snijders, A. M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., Hamilton, G., Hindle, A. K., Huey, B., Kimura, K. et al. (2001). Assembly of microarrays for genome-wide measurement of DNA copy number., Nat. Genet. 29 263-264.
[58] Spokoiny, V. (2009). Multiscale local change point detection with applications to value-at-risk., Ann. Statist. 37 1405-1436. · Zbl 1160.62084 · doi:10.1214/08-AOS612
[59] Storath, M., Weinmann, A. and Demaret, L. (2014). Jump-sparse and sparse recovery using Potts functionals., IEEE Trans. Signal Process. 62 3654-3666. · Zbl 1394.94561 · doi:10.1109/TSP.2014.2329263
[60] Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso., Biostatistics 9 18-29. · Zbl 1274.62886 · doi:10.1093/biostatistics/kxm013
[61] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91-108. · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[62] Tsybakov, A. (2009)., Introduction to Nonparametric Estimation . Springer-Verlag New York. · Zbl 1176.62032
[63] van der Vaart, A. W. and Wellner, J. A. (1996)., Weak convergence and empirical processes . Springer Series in Statistics . Springer-Verlag, New York. With applications to statistics. · Zbl 0862.60002
[64] VanDongen, A. (1996). A new algorithm for idealizing single ion channel data containing multiple unknown conductance levels., Biophys. J. 70 1303.
[65] Venkatraman, E. S. and Olshen, A. B. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data., Bioinformatics 23 657-663.
[66] Vitale, R. A. (2000). Some comparisons for Gaussian processes., Proc. Amer. Math. Soc. 128 3043-3046. · Zbl 0955.60036 · doi:10.1090/S0002-9939-00-05367-3
[67] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics., Ann. Statist. 38 1010-1033. · Zbl 1183.62076 · doi:10.1214/09-AOS732
[68] Wu, W. B. (2007). Strong invariance principles for dependent random variables., Ann. Probab. 35 2294-2320. · Zbl 1166.60307 · doi:10.1214/009117907000000060
[69] Wu, W. B. and Zhao, Z. (2007). Inference of trends in time series., J. R. Stat. Soc. Ser. B Stat. Methodol. 69 391-410. · doi:10.1111/j.1467-9868.2007.00594.x
[70] Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data., Biometrics 63 22-32. · Zbl 1206.62174 · doi:10.1111/j.1541-0420.2006.00662.x
[71] Zhang, N. R. and Siegmund, D. O. (2012). Model selection for high-dimensional, multi-sequence change-point problems., Statist. Sinica 22 1507-1538. · Zbl 1264.62079
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.