Variable selection in model-based clustering: a general variable role modeling. (English) Zbl 1453.62154

Summary: The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. A more versatile variable selection model is proposed, taking into account three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally independent of all the relevant variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modeling. The model identifiability and the consistency of the variable selection criterion are also established. Numerical experiments highlight the interest of this new modeling.


62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI Link


[1] Banfield, J.D.; Raftery, A.E., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 3, 803-821, (1993) · Zbl 0794.62034
[2] Biernacki, C.; Celeux, G.; Govaert, G.; Langrognet, F., Model-based cluster and discriminant analysis with the {\scmixmod} software, Computational statistics and data analysis, 51, 2, 587-600, (2006) · Zbl 1157.62431
[3] Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J., Classification and regression trees, (1984), Wadsworth International Belmont, California · Zbl 0541.62042
[4] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern recognition, 28, 5, 781-793, (1995)
[5] Dempster, A.P.; Laird, N.M.; Rubin, D.B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), Journal of the royal statistical society, series B, 39, 1, 1-38, (1977) · Zbl 0364.62022
[6] Law, M.H.; Figueiredo, M.A.T.; Jain, A.K., Simultaneous feature selection and clustering using mixture models, IEEE transactions on pattern analysis and machine intelligence, 26, 9, 1154-1166, (2004)
[7] McLachlan, G.; Peel, D., Finite mixture models, (2000), Wiley-Interscience New York · Zbl 0963.62061
[8] Maugis, C., Celeux, G., Martin-Magniette, M.-L., 2009. Variable selection for clustering with Gaussian mixture models. Biometrics (in press) · Zbl 1172.62021
[9] Maugis, C., Celeux, G., Martin-Magniette, M.-L., 2008. Variable selection in model-based clustering: A general variable role modelling. Technical Report INRIA, RR-6744 · Zbl 1453.62154
[10] Raftery, A.E.; Dean, N., Variable selection for model-based clustering, Journal of the American statistical association, 101, 473, 168-178, (2006) · Zbl 1118.62339
[11] Tadesse, M.G.; Sha, N.; Vannucci, M., Bayesian variable selection in clustering high-dimensional data, Journal of the American statistical association, 100, 470, 602-617, (2005) · Zbl 1117.62433
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.