Semiparametric density testing in the contamination model. (English) Zbl 1434.62048
Summary: In this paper we investigate a semiparametric testing approach to answer if the parametric family allocated to the unknown density of a two-component mixture model with one known component is correct or not. Based on a semiparametric estimation of the Euclidean parameters of the model (free from the null assumption), our method compares pairwise the Fourier’s type coefficients of the model estimated directly from the data with the ones obtained by plugging the estimated parameters into the mixture model. These comparisons are incorporated into a sum of square type statistic which order is controlled by a penalization rule. We prove under mild conditions that our test statistic is asymptotically $$\chi^2_1$$-distributed and study its behavior, both numerically and theoretically, under different types of alternatives including contiguous nonparametric alternatives. We discuss the counterintuitive, from the practitioner point of view, lack of power of the maximum likelihood version of our test in a neighborhood of challenging non-identifiable situations. Several level and power studies are numerically conducted on models close to those considered in the literature, such as in G. J. McLachlan [“A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays”, Bioinform. 22, No. 13, 1608–1615 (2006; doi:10.1093/bioinformatics/btl148)], to validate the suitability of our approach. We also implement our testing procedure on the Carina galaxy real dataset which low luminosity mixes with the one of its companion Milky Way. Finally we discuss possible extensions of our work to a wider class of contamination models.
##### MSC:
 62G07 Density estimation 85A15 Galactic and stellar structure 62H30 Classification and discrimination; cluster analysis (statistical aspects) 62P35 Applications of statistics to physics
