Breakdown points for maximum likelihood estimators of location-scale mixtures.

*(English)*Zbl 1047.62063Summary: ML-estimation based on mixtures of normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by G. McLachlan and D. Peel [Finite Mixture Models. (2000; Zbl 0963.62061)] and the addition of a further mixture component accounting for “noise” by C. Fraley and A. E. Raftery [Comput. J. 41, 578–588 (1998; Zbl 0920.68038)], were suggested as more robust alternatives.

In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on normal mixtures. If the number of clusters \(s\) is treated as fixed, \(r\) additional points suffice for all three methods to let the parameters of \(r\) clusters explode. Only in the case of \(r=s\) this is not possible for \(t\)-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of G. Schwarz [Ann. Stat. 6, 461–464 (1978; Zbl 0379.62005)], and to isolate gross outliers as clusters of one point, is crucial for an improved breakdown behavior of all three techniques. Furthermore, a mixture of normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.

In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on normal mixtures. If the number of clusters \(s\) is treated as fixed, \(r\) additional points suffice for all three methods to let the parameters of \(r\) clusters explode. Only in the case of \(r=s\) this is not possible for \(t\)-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of G. Schwarz [Ann. Stat. 6, 461–464 (1978; Zbl 0379.62005)], and to isolate gross outliers as clusters of one point, is crucial for an improved breakdown behavior of all three techniques. Furthermore, a mixture of normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.

##### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62F35 | Robustness and adaptive procedures (parametric inference) |

62F10 | Point estimation |

##### Keywords:

model-based cluster analysis; robust statistics; normal mixtures; mixtures of t-distributions; noise component; classification breakdown point**OpenURL**

##### References:

[1] | Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716–723. · Zbl 0314.62039 |

[2] | Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49 803–821. · Zbl 0794.62034 |

[3] | Bozdogan, H. (1994). Mixture model cluster analysis using model selection criteria and a new informational measure of complexity. In Multivariate Statistical Modeling. Proc. First US/Japan Conference on the Frontiers of Statistical Modeling. An Informational Approach (H. Bozdogan, ed.) 2 69–113. Kluwer, Dordrecht. |

[4] | Bryant, P. and Williamson, J. A. (1986). Maximum likelihood and classification: A comparison of three approaches. In Classification as a Tool of Research (W. Gaul and M. Schader, eds.) 35–45. North-Holland, Amsterdam. · Zbl 0587.62103 |

[5] | Byers, S. and Raftery, A. E. (1998). Nearest neighbor clutter removal for estimating features in spatial point processes. J. Amer. Statist. Assoc. 93 577–584. · Zbl 0926.62089 |

[6] | Campbell, N. A. (1984). Mixture models and atypical values. Math. Geol. 16 465–477. |

[7] | Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. J. Classification 13 195–212. · Zbl 0861.62051 |

[8] | Davies, P. L. and Gather, U. (1993). The identification of multiple outliers (with discussion). J. Amer. Statist. Assoc. 88 782–801. · Zbl 0797.62025 |

[9] | Davies, P. L. and Gather, U. (2002). Breakdown and groups. Technical Report 57-2002, SFB 475, Univ. Dortmund. Available at wwwstat.mathematik.uni-essen.de/ davies/ brkdown220902.ps.gz. · Zbl 1077.62041 |

[10] | Day, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika 56 463–474. · Zbl 0183.48106 |

[11] | Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38. · Zbl 0364.62022 |

[12] | DeSarbo, W. S. and Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. J. Classification 5 249–282. · Zbl 0692.62052 |

[13] | Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point. In A Festschrift for Erich L. Lehmann (P. J. Bickel, K. Doksum and J. L. Hodges, Jr., eds.) 157–184. Wadsworth, Belmont, CA. · Zbl 0523.62032 |

[14] | Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer J. 41 578–588. · Zbl 0920.68038 |

[15] | Gallegos, M. T. (2003). Clustering in the presence of outliers. In Exploratory Data Analysis in Empirical Research (M. Schwaiger and O. Opitz, eds.) 58–66. Springer, Berlin. |

[16] | Garcia-Escudero, L. A. and Gordaliza, A. (1999). Robustness properties of \(k\) means and trimmed \(k\) means. J. Amer. Statist. Assoc. 94 956–969. · Zbl 1072.62547 |

[17] | Hampel, F. R. (1971). A general qualitative definition of robustness. Ann. Math. Statist. 42 1887–1896. · Zbl 0229.62041 |

[18] | Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383–393. · Zbl 0305.62031 |

[19] | Hastie, T. and Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. J. Roy. Statist. Soc. Ser. B 58 155–176. · Zbl 0850.62476 |

[20] | Hathaway, R. J. (1985). A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Statist. 13 795–800. JSTOR: · Zbl 0576.62039 |

[21] | Hathaway, R. J. (1986). A constrained EM algorithm for univariate normal mixtures. J. Stat. Comput. Simul. 23 211–230. |

[22] | Hennig, C. (2003). Robustness of ML estimators of location–scale mixtures. Available at www.math.uni-hamburg.de/home/hennig/papers/hennigcottbus.pdf. · Zbl 05243397 |

[23] | Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73–101. · Zbl 0136.39805 |

[24] | Huber, P. J. (1981). Robust Statistics . Wiley, New York. · Zbl 0536.62025 |

[25] | Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhyā Ser. A 62 49–66. · Zbl 1081.62516 |

[26] | Kharin, Y. (1996). Robustness in Statistical Pattern Recognition . Kluwer, Dordrecht. · Zbl 0879.62054 |

[27] | Lindsay, B. G. (1995). Mixture Models : Theory , Geometry and Applications . IMS, Hayward, CA. · Zbl 0832.62027 |

[28] | Markatou, M. (2000). Mixture models, robustness, and the weighted likelihood methodology. Biometrics 56 483–486. · Zbl 1060.62511 |

[29] | McLachlan, G. J. (1982). The classification and mixture maximum likelihood approaches to cluster analysis. In Handbook of Statistics (P. R. Krishnaiah and L. Kanal, eds.) 2 199–208. North-Holland, Amsterdam. · Zbl 0513.62064 |

[30] | McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Statist. 36 318–324. |

[31] | McLachlan, G. J. and Basford, K. E. (1988). Mixture Models : Inference and Applications to Clustering . Dekker, New York. · Zbl 0697.62050 |

[32] | McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models . Wiley, New York. · Zbl 0963.62061 |

[33] | Peel, D. and McLachlan, G. J. (2000). Robust mixture modeling using the \(t\) distribution. Stat. Comput. 10 339–348. |

[34] | Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev . 26 195–239. · Zbl 0536.62021 |

[35] | Rocke, D. M. and Woodruff, D. L. (2000). A synthesis of outlier detection and cluster identification. Unpublished manuscript. |

[36] | Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894–902. · Zbl 0889.62021 |

[37] | Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464. JSTOR: · Zbl 0379.62005 |

[38] | Tyler, D. E. (1994). Finite sample breakdown points of projection based multivariate location and scatter statistics. Ann. Statist. 22 1024–1044. JSTOR: · Zbl 0815.62015 |

[39] | Wang, H. H. and Zhang, H. (2002). Model-based clustering for cross-sectional time series data. J. Agric. Biol. Environ. Statist. 7 107–127. |

[40] | Wolfe, J. H. (1967). NORMIX: Computational methods for estimating the parameters of multivariate normal mixtures of distributions. Research Memo SRM 68-2, U.S. Naval Personnel Research Activity, San Diego. |

[41] | Zhang, J. and Li, G. (1998). Breakdown properties of location M-estimators. Ann. Statist. 26 1170–1189. · Zbl 0929.62031 |

[42] | Zuo, Y. (2001). Some quantitative relationships between two types of finite sample breakdown point. Statist. Probab. Lett. 51 369–375. · Zbl 0972.62022 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.