Model selection for the segmentation of multiparameter exponential family distributions.

*(English)*Zbl 1362.62068Summary: We consider the segmentation problem of univariate distributions from the exponential family with multiple parameters. In segmentation, the choice of the number of segments remains a difficult issue due to the discrete nature of the change-points. In this general exponential family distribution framework, we propose a penalized \(\log\)-likelihood estimator where the penalty is inspired by papers of L. BirgĂ© and P. Massart [in: Festschrift for Lucien Le Cam: research papers in probability and statistics. New York, NY: Springer. 55–87 (1997; Zbl 0920.62042); J. Eur. Math. Soc. (JEMS) 3, No. 3, 203–268 (2001; Zbl 1037.62001); Probab. Theory Relat. Fields 138, No. 1–2, 33–73 (2007; Zbl 1112.62082)]. The resulting estimator is proved to satisfy some oracle inequalities. We then further study the particular case of categorical variables by comparing the values of the key constants when derived from the specification of our general approach and when obtained by working directly with the characteristics of this distribution. Finally, simulation studies are conducted to assess the performance of our criterion and to compare our approach to other existing methods, and an application on real data modeled using the categorical distribution is provided.

##### MSC:

62G05 | Nonparametric estimation |

62G07 | Density estimation |

60E10 | Characteristic functions; other transforms |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |