zbMATH — the first resource for mathematics

Model-based linear clustering. (English. French summary) Zbl 1349.62288
Summary: The authors propose a profile likelihood approach to linear clustering which explores potential linear clusters in a data set. For each linear cluster, an errors-in-variables model is assumed. The optimization of the derived profile likelihood can be achieved by an EM algorithm. Its asymptotic properties and its relationships with several existing clustering methods are discussed. Methods to determine the number of components in a data set are adapted to this linear clustering setting. Several simulated and real data sets are analyzed for comparison and illustration purposes.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
62F12 Asymptotic properties of parametric estimators
Full Text: DOI
[1] Banfield, Model-based Gaussian and non-Gaussian clustering, Biometrics 49 pp 803– (1993) · Zbl 0794.62034
[2] Biernacki, An improvement of the NEC criterion for assessing the number of clusters in a mixture model, Pattern Recognition Letters 20 pp 267– (1999) · Zbl 0933.68117
[3] Biernacki, Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 pp 719– (2000)
[4] Campbell, A multivariate study of variation in two species of rock crab of the genus Leptograpsus, Australian Journal of Zoology 22 pp 417– (1974)
[5] Celeux, Comparison of the mixture and the classification maximum likelihood in cluster analysis, Journal of Statistical Computation and Simulation 47 pp 127– (1993)
[6] Celeux, An entropy criterion for assessing the number of clusters in a mixture model, Journal of Classification 13 pp 195– (1996) · Zbl 0861.62051
[7] Chen, Robust regression for data with multiple structures, 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’01) 1 pp 1069– (2001)
[8] Everitt, Cluster Analysis (2001) · doi:10.1201/9781420057492.ch10
[9] Fraley, How many clusters? Which clustering method? Answers via model-based cluster analysis, The Computer Journal 41 pp 578– (1998) · Zbl 0920.68038
[10] Fraley, Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association 97 pp 611– (2002) · Zbl 1073.62545
[11] C. Fraley A. E. Raftery 2006 MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering
[12] Fujisawa, Genotyping of single nucleotide polymorphism using model-based clustering, Bioinformatics 20 pp 718– (2004)
[13] Fuller, Measurement Error Models (1987)
[14] Garcia-Escudero, Robust linear clustering, Journal of the Royal Statistical Society, Series B 71 pp 301– (2009)
[15] J. Harrington 2007 lga: Tools for linear grouping analysis (LGA)
[16] Hathaway, A constrained formulation of maximum-likelihood estimation for normal mixture distributions, The Annals of Statistics 13 pp 795– (1985) · Zbl 0576.62039
[17] Hathaway, A constrained EM algorithm for univariate normal mixtures, Journal of Statistical Computation and Simulation 23 pp 211– (1986)
[18] Kang, Incorporating genotyping uncertainty in haplotype inference for single-nucleotide polymorphisms, The American Journal of Human Genetics 74 pp 495– (2004)
[19] Keribin, Consistent estimation of the order of mixture models, Sankhya: The Indian Journal of Statistics, Series A 62 pp 49– (2000) · Zbl 1081.62516
[20] Kiefer, Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters, The Annals of Mathematical Statistics 27 pp 887– (1956) · Zbl 0073.14701
[21] Marriott, Separating mixtures of normal distributions, Biometrics 31 pp 767– (1975) · Zbl 0308.62050
[22] McLachlan, On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture, Applied Statistics 36 pp 318– (1987)
[23] McLachlan, Finite Mixture Models (2000) · Zbl 0963.62061
[24] Olivier, High-throughput genotyping of single nucleotide polymorphisms using new biplex invader technology, Nucleic Acids Research 30 pp e53– (2002)
[25] Peel, Robust mixture modelling using the t distribution, Statistics and Computing 10 pp 339– (2000)
[26] Ranade, High-throughput genotyping with single nucleotide polymorphisms, Genome Research 11 pp 1262– (2001)
[27] Redner, Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions, The Annals of Statistics 9 pp 225– (1981) · Zbl 0453.62021
[28] Smyth, Model selection for probabilistic clustering using cross-validated likelihood, Statistics and Computing 10 pp 63– (2000)
[29] Spath, A fast algorithm for clusterwise linear regression, Computing 29 pp 175– (1982)
[30] Tibshirani, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society, Series B 63 pp 411– (2001) · Zbl 0979.62046
[31] Turner, Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions, Applied Statistics 49 pp 371– (2000) · Zbl 0971.62076
[32] T. R. Turner 2006 mixreg: Functions to fit mixtures of regressions
[33] Van Aelst, Linear grouping using orthogonal regression, Computational Statistics and Data Analysis 50 pp 1287– (2006) · Zbl 1431.62273
[34] Wald, Note on the consistency of the maximum likelihood estimate, The Annals of Mathematical Statistics 20 pp 595– (1949) · Zbl 0034.22902
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.