×

Model-based co-clustering for ordinal data. (English) Zbl 1469.62086

Summary: A model-based co-clustering algorithm for ordinal data is presented. This algorithm relies on the latent block model embedding a probability distribution specific to ordinal data (the so-called BOS or Binary Ordinal Search distribution). Model inference relies on a Stochastic EM algorithm coupled with a Gibbs sampler, and the ICL-BIC criterion is used for selecting the number of co-clusters (or blocks). The main advantage of this ordinal dedicated co-clustering model is its parsimony, the interpretability of the co-cluster parameters (mode, precision) and the possibility to take into account missing data. Numerical experiments on simulated data show the efficiency of the inference strategy, and real data analyses illustrate the interest of the proposed procedure.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] Agresti, A., (Analysis of Ordinal Categorical Data, Wiley Series in Probability and Statistics, (2010), Wiley-Interscience New York) · Zbl 1263.62007
[2] Bathia, P.; Iovleff, S.; Govaert, G., An r package and c++ library for latent block models: theory, usage and applications, J. Statist. Softw, (2016)
[3] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans. Pattern Anal. Mach. Intell., 22, 7, 719-725, (2001)
[4] Biernacki, C.; Jacques, J., Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm, Stat. Comput., 26, 5, 929-943, (2016) · Zbl 06652986
[5] Brault, V., Keribin, C., Mariadassou, M., 2017. Consistency and asymptotic normality of latent blocks model estimators. Tech. rep., Version arXiv:1704.06629.
[6] Candès, E. J.; Recht, B., Exact matrix completion via convex optimization, Found. Comput. Math., 9, 6, 717, (2009) · Zbl 1219.90124
[7] Celeux, G.; Govaert, G., Latent class models for categorical data, (Hennig, C.; Meila, M.; Murthag, F.; Rocci, R., Handbook of Cluster Analysis, Chapman & Hall/CRC Handbooks of Modern Statistical Methods, (2015), Chapman & Hall/CRC), 173-194 · Zbl 1396.62125
[8] Cousson-Gélie, F., Breast cancer, coping and quality of life: a semi-prospective study, Eur. Rev. Appl. Psychol., 3, 315-320, (2000)
[9] D’Elia, A.; Piccolo, D., A mixture model for preferences data analysis, Comput. Statist. Data Anal., 49, 3, 917-934, (2005) · Zbl 1429.62077
[10] Dillon, W. R.; Madden, T. S.; Firtle, N. H., Marketing research in a marketing environment, (1994), Irwin
[11] Fayers, P., Aaronson, N., Bjordal, K., Groenvold, M., Curran, D., Bottomley, A., 2001. EORTC QLQ-C30 Scoring Manual (3rd edition).
[12] Fernández, D.; Arnold, R.; Pledger, S., Mixture-based clustering for the ordered stereotype model, Comput. Statist. Data Anal., 93, 46-75, (2016) · Zbl 1468.62054
[13] Giordan, M.; Diana, G., A clustering method for categorical ordinal data, Comm. Statist. Theory Methods, 40, 1315-1334, (2011) · Zbl 1220.62082
[14] Gouget, C., Utilisation des modèles de Mélange pour la classification automatique de données ordinales, (2006), Universit de Technologie de Compiègne, (Ph.D. thesis)
[15] Govaert, G.; Nadif, M., An EM algorithm for the block mixture model, IEEE Trans. Pattern Anal. Mach. Intell., 27, 4, 643-647, (2005)
[16] Govaert, G.; Nadif, M., Co-clustering, (2013), Wiley-ISTE · Zbl 0800.62322
[17] Hartigan, J., Direct clustering of a data matrix, J. Amer. Statist. Assoc., 67, 337, 123-129, (1972)
[18] Hartigan, J., Clustering algorithm, (1975), Wiley New-York · Zbl 0321.62069
[19] Hasnat, M. A.; Velcin, J.; Bonnevay, S.; Jacques, J., Evolutionary clustering for categorical data using parametric links among multinomial mixture models, Econometrics Stat., 3, 141-159, (2017)
[20] Jacques, J.; Biernacki, C., Extension of model-based classification for binary data when training and test populations differ, J. Appl. Stat., 37, 5, 749-766, (2010)
[21] Jollois, F.-X., Nadif, M., 2011. Classification de données ordinales : modèles et algorithmes. In: Proceedings of the 43th Conference of the French Statistical Society, Bordeaux, France.
[22] Kaggle,, 2017. Amazone Fine Food Reviews. https://www.kaggle.com/snap/amazon-fine-food-reviews.
[23] Kaiser, S., Santamaria, R., Khamiakova, T., Sill, M., Theron, R., Quintales, L., Leisch, F., De Troyer., E., 2015. biclust: BiCluster Algorithms. R package version 1.2.0. URL https://CRAN.R-project.org/package=biclust.
[24] Kaufman, L.; Rousseeuw, P. J., Finding groups in data: an introduction to cluster analysis, (1990), Wiley · Zbl 1345.62009
[25] Keribin, C.; Brault, V.; Celeux, G.; Govaert, G., Estimation and selection for the latent block model on categorical data, Stat. Comput., 25, 6, 1201-1216, (2015) · Zbl 1331.62149
[26] Lewis, S. J.G.; Foltynie, T.; Blackwell, A. D.; Robbins, T. W.; Owen, A. M.; Barker, R. A., Heterogeneity of parkinson’s disease in the early clinical stages using a data driven approach, J. Neurol. Neurosurg. Psychiatry, 76, 343-348, (2003)
[27] Little, R.; Rubin, D., Statistical analysis with missing data, (2002), Wiley · Zbl 1011.62004
[28] Matechou, E.; Liu, I.; Fernandez, D.; Farias, M.; Gjelsvik, B., Biclustering models for two-mode ordinal data, Psychometrika, 81, 3, 611-624, (2016) · Zbl 1345.62160
[29] Pledger, S.; Arnold, R., Multivariate methods using mixtures: correspondence analysis, scaling and pattern-detection, Comput. Statist. Data Anal., 71, 241-261, (2014)
[30] Podani, J., Braun-blanquet’s legacy and data analysis in vegetation science, J. Veg. Sci., 17, 113-117, (2006)
[31] Ranalli, M.; Rocci, R., Mixture models for ordinal data: a pairwise likelihood approach, Stat. Comput., 26, 1, 529-547, (2016) · Zbl 1342.62111
[32] Rand, W., Objective criteria for the evaluation of clustering methods, J. Amer. Statist. Assoc., 66, 336, 846-850, (1971)
[33] Schepers, J.; Bock, H.-H.; Mechelen, I., Maximal interaction two-mode clustering, J. Classification, 75, 49-75, (2017) · Zbl 1364.62159
[34] Stevens, S., On the theory of scales of measurement, Science, 103, 2684, 677-680, (1946) · Zbl 1226.91050
[35] Vermunt, J.; Magidson, J., Technical guide for latent GOLD 4.0: basic and advanced, (2005), Statistical Innovations Inc. Belmont, Massachusetts
[36] Vichi, M., (Borra, S.; Rocci, R.; Vichi, M.; Schader, M., Double k-Means Clustering for Simultaneous Classification of Objects and Variables, Advances in Classification and Data Analysis, (2001), Springer Berlin Heidelberg Berlin, Heidelberg), 43-52
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.