##
**Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data.**
*(English)*
Zbl 1417.62114

Authors’ abstract: We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations, which can be applied to various data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, under some mild conditions, the new test statistics are asymptotically distribution-free under the null hypothesis of no change. Analytic \(p\)-value approximations to the significance of the new test statistics for the single change-point alternative and changed interval alternative are derived, making the new approaches easy off-the-shelf tools for large datasets. The new approaches are illustrated in an analysis of New York taxi data.

Reviewer: Wiesław Dziubdziela (Miedziana Góra)

### MSC:

62G10 | Nonparametric hypothesis testing |

62H15 | Hypothesis testing in multivariate analysis |

62E20 | Asymptotic distribution theory in statistics |

### Keywords:

change-point; graph-based tests; nonparametric; scan statistic; tail probability; high-dimensional data; network data; non-Euclidean data### Software:

LogConcDEAD
PDF
BibTeX
XML
Cite

\textit{L. Chu} and \textit{H. Chen}, Ann. Stat. 47, No. 1, 382--414 (2019; Zbl 1417.62114)

### References:

[1] | Carlstein, E., Müller, H.-G. and Siegmund, D., eds. (1994). Change-Point Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series23. IMS, Hayward, CA. Papers from the AMS-IMS-SIAM Summer Research Conference held at Mt. Holyoke College, South Hadley, MA, July 11–16, 1992. |

[2] | Chen, H., Chen, X. and Su, Y. (2017). A weighted edge-count two-sample test for multivariate and object data. J. Amer. Statist. Assoc.112. To appear. DOI:10.1080/01621459.2017.1307757. · Zbl 1402.62079 |

[3] | Chen, H. and Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and object data. J. Amer. Statist. Assoc.112 397–409. |

[4] | Chen, J. and Gupta, A. K. (2012). Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance, 2nd ed. Birkhäuser/Springer, New York. · Zbl 1273.62016 |

[5] | Chen, L. H. and Shao, Q.-M. (1994). Stein’s Method for Normal Approximation. In An Introduction to Stein’s Method. Lecture Notes Series4 1–59. World Scientific, Singapore. · Zbl 1072.62007 |

[6] | Chen, H. and Zhang, N. (2015). Graph-based change-point detection. Ann. Statist.43 139–176. · Zbl 1308.62090 |

[7] | Chu, L. and Chen, H. (2019). Supplement to “Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data.” DOI:10.1214/18-AOS1691SUPP. |

[8] | Csörgő, M. and Horváth, L. (1997). Limit Theorems in Change-Point Analysis. Wiley, Chichester. |

[9] | Cule, M., Samworth, R. and Stewart, M. (2010). Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. Ser. B. Stat. Methodol.72 545–607. · Zbl 1329.62183 |

[10] | Desobry, F., Davy, M. and Doncarli, C. (2005). An online kernel change detection algorithm. IEEE Trans. Signal Process.53 2961–2974. · Zbl 1370.94317 |

[11] | Friedman, J. H. and Rafsky, L. C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. Ann. Statist.7 697–717. · Zbl 0423.62034 |

[12] | Heard, N. A., Weston, D. J., Platanioti, K. and Hand, D. J. (2010). Bayesian anomaly detection methods for social networks. Ann. Appl. Stat.4 645–662. · Zbl 1194.62021 |

[13] | Jirak, M. (2015). Uniform change point tests in high dimension. Ann. Statist.43 2451–2483. · Zbl 1327.62467 |

[14] | Kossinets, G. and Watts, D. J. (2006). Empirical analysis of an evolving social network. Science311 88–90. · Zbl 1226.91055 |

[15] | Lung-Yut-Fong, A., Lévy-Leduc, C. and Cappé, O. (2015). Homogeneity and change-point detection tests for multivariate data using rank statistics. J. SFdS156 133–162. · Zbl 1338.62134 |

[16] | Matteson, D. S. and James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Amer. Statist. Assoc.109 334–345. · Zbl 1367.62260 |

[17] | Park, Y., Wang, H., Nöbauer, T., Vaziri, A. and Priebe, C. E. (2015). Anomaly detection on whole-brain functional imaging of neuronal activity using graph scan statistics. In ACM Conference on Knowledge Discovery and Data Mining (KDD), Workshop on Outlier Definition, Detection, and Description (ODDx3). |

[18] | Siegmund, D. and Yakir, B. (2007). The Statistics of Gene Mapping. Statistics for Biology and Health. Springer, New York. · Zbl 1280.62012 |

[19] | Wang, H., Tang, M., Park, Y. and Priebe, C. E. (2014). Locality statistics for anomaly detection in time series of graphs. IEEE Trans. Signal Process.62 703–717. · Zbl 1394.94790 |

[20] | Xie, Y.and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist.41 670–692. · Zbl 1267.62084 |

[21] | Zhang, N. · Zbl 1195.62168 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.