×

Clustering of multivariate binary data with dimension reduction via \(L_{1}\)-regularized likelihood maximization. (English) Zbl 1394.68322

Summary: Clustering methods with dimension reduction have been receiving considerable wide interest in statistics lately and a lot of methods to simultaneously perform clustering and dimension reduction have been proposed. This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure. The method is based on a finite mixture model of multivariate Bernoulli distributions, and each component is assumed to have a low-dimensional representation of the cluster structure. This method can be considered as an extension of the traditional latent class analysis. Sparsity is introduced to the loading values, which produces the low-dimensional subspace, for enhanced interpretability and more stable extraction of the subspace. An EM-based algorithm is developed to efficiently solve the proposed optimization problem. We demonstrate the effectiveness of the proposed method by applying it to a simulation study and real datasets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

UCI-ml
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ekholm, A.; McDonald, J. W.; Smith, P. W.F., Association models for a multivariate binary response, Biometrics, 56, 712-718, (2000) · Zbl 1060.62602
[2] L. Kozma, A. Ilin, T. Raiko, Binary principal component analysis in the netflix collaborative filtering task, in: Proceedings of 2009 IEEE International Workshop on Machine Learning for Signal Processing, 2009.
[3] G.W. Milligan, Clustering validation: results and implications for applied analysis, in: P. Arabie, L.J. Hubert, G. DeSoete (Eds.), Clustering and Classification, World Scientific Publishing, River Edge, 1996, pp. 341-375. · Zbl 0895.62069
[4] Vichi, M.; Kiers, H. A.L., Factorial k-means analysis for two-way data, Comput. Stat. Data Anal., 37, 1, 49-64, (2001) · Zbl 1051.62056
[5] A.I. Schein, L.K. Saul, L.H. Ungar, A generalized linear model for principal component analysis of binary data, in: C.M. Bishop, B.J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, vol. 38, Key West, Florida, 2003, pp. 14-21.
[6] Lee, S.; Huang, J. Z.; Hu, J., Sparse logistic principal components analysis for binary data, Ann. Appl. Stat., 4, 1579-1601, (2010) · Zbl 1202.62084
[7] Moustaki, I.; Knott, M., Generalized latent trait models, Psychometrika, 65, 391-411, (2000) · Zbl 1291.62236
[8] M. Collins S. Dasgupta R.E. Schapire, A generalization of principal component analysis to the exponential family, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.) Advanced in Neural Information Processing System, vol. 14, MIT Press, Cambridge, MA, 2002, pp. 617-642
[9] J. Li, D.Tao. Simple exponential family pca, in: 2010 Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[10] Li, J.; Tao, D., Exponential family factors for Bayesian factor analysis, IEEE Trans. Neural Netw. Learn. Syst., 24, 964-976, (2013)
[11] Arabie, P.; Hubert, L., Cluster analysis in marketing research, (Bagozzi, R. P., Advanced Methods of Marketing Research, (1994), Blackwell Oxford), 160-189
[12] DeSarbo, W. S.; Jedidi, K.; Cool, K.; Schendel, D., Simultaneous multidimensional unfolding and cluster analysisan investigation of strategic groups, Mark. Lett., 2, 2, 129-146, (1990)
[13] de Soete, G.; Carroll, J. D., K means clustering in a low-dimensional Euclidean space, (Diday, E.; Lechevallier, Y.; Schader, M.; Bertrand, P.; Burtschy, B., New Approaches in Classification and Data Analysis, (1994), Springer Berlin, Heidelberg), 212-219
[14] Timmerman, M. E.; Ceulemans, E.; Kiers, H. A.L.; Vichi, M., Factorial and reduced k-means reconsidered, Comput. Stat. Data Anal., 54, 1858-1871, (2010) · Zbl 1284.62396
[15] Yamamoto, M.; Hwang, H., A general formulation of cluster analysis with dimension reduction and subspace separation, Behaviormetrika, 41, 115-129, (2014)
[16] Z. Ghahramani, G.E. Hilton, The EM algorithm for mixture of factor analyzers, Technical Report CRG-TR-96-1, Department of Computer Science, University of Toronto, Canada, 1997.
[17] R. Yoshida, T. Higuchi, S. Imoto, A mixed factors model for dimension reduction and extraction of a group structure in gene expression data, in: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, 2004, pp. 161-172.
[18] A. Patrikainen, H. Mannila, Subspace clustering of high-dimensional binary data—a probabilistic approach, in: Workshop on Clustering High Dimensional Data and its Applications, SIAM International Conference on Data Mining, 2004, pp. 57-65.
[19] Cagnone, S.; Viroli, C., A factor mixture analysis model for multivariate binary data, Stat. Model., 12, 257-277, (2012)
[20] Vidal, R., Subspace clustering, Signal Process. Mag., 28, 52-68, (2011)
[21] T. Evgeniou, M. Pontil, Regularized multi-task learning, in: KDD ’04 Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 109-117.
[22] Argyriou, A.; Micchelli, C. A.; Pontil, M.; Ying, Y., A spectral regularization framework for multi-task structure learning, Adv. Neural Inf. Process. Syst., 20, 25-32, (2007)
[23] S. Ji, L. Tang, S. Yu, J. Ye, A shared-subspace learning framework for multi-label classification. ACM Trans. Knowl. Discov. Data, 4 (2) (2010) (Article 8).
[24] A. Agarwal, H. Daumé III, S. Gerber, Learning multiple tasks using manifold regularization, in: Proceedings of Conference on Neural Information Processing Systems (NIPS), 2010.
[25] Ando, R. K.; Zhang, T., A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res., 6, 1817-1853, (2005) · Zbl 1222.68133
[26] Luo, Y.; Tao, D.; Geng, B.; Xu, C.; Maybank, S. J., Manifold regularized multitask learning for semi-supervised multilabel image classification, IEEE Trans. Image Process., 22, 2, 523-536, (2013) · Zbl 1373.94269
[27] Aitkin, M.; Anderson, D.; Hinde, J., Statistical modeling of data on teaching styles, J. R. Stat. Soc. Ser. A, 144, 419-461, (1981)
[28] Magidson, J.; Vermunt, J. K., Latent class models for clustering, Can. J. Mark. Res., 20, 37-44, (2002)
[29] L.M. Collins, S.T. Lanza, Latent Class and Latent Transition Analysis with Applications in the Social Behavioral and Health Sciences, John Wiley & Sons, Inc., New Jersey, 2010.
[30] Tibshirani, R. J., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B, 58, 267-288, (1996) · Zbl 0850.62538
[31] Browne, M. W., An overview of analytic rotation in exploratory factor analysis, Multivar. Behav. Res., 36, 1, 111-150, (2001)
[32] K. Hirose, M. Yamamoto, Sparse estimation via nonconcave penalized likelihood in a factor analysis model, Stat. Comput., 10.1007/s11222-014-9458-0 (online) · Zbl 1332.62194
[33] Frühwirth-Schnatter, S., Finite mixture and Markov switching models, (2006), Springer New York · Zbl 1108.62002
[34] Dempster, N. M.; Laird, A. P.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. R. Stat. Soc. B, 39, 1-38, (1977) · Zbl 0364.62022
[35] Hunter, D. R.; Lange, K., A tutorial on MM algorithms, Am. Stat., 58, 30-37, (2004)
[36] Jaakkola, T. S.; Jordan, M. I., Bayesian parameter estimation via variational methods, Stat. Comput., 10, 25-37, (2000)
[37] DeLeeuw, J., Principal component analysis of binary data by iterated singular value decomposition, Comput. Stat. Data Anal., 50, 21-39, (2006)
[38] Jennrich, R. I., A simple general procedure for orthogonal rotation, Psychometirka, 66, 2, 289-306, (2001) · Zbl 1293.62247
[39] Jennrich, R. I., A simple general procedure for oblique rotation, Psychometirka, 67, 1, 7-20, (2002) · Zbl 1297.62232
[40] Bouguila, N., On multivariate binary data clustering and feature weighting, Comput. Stat. Data Anal., 54, 120-134, (2010) · Zbl 1284.62364
[41] Hubert, L.; Arabie, P., Comparing partitions, J. Class., 2, 193-218, (1985)
[42] L. Danon, A. Díaz-Guilera, J. Duch, A. Arenas, Comparing community structure identification, J. Stat. Mech.Theory Exp., P09008, 2005.
[43] K. Bache, M. Lichman. UCI machine learning repository, 2013.
[44] Hao, K.; Li, C.; Rosenow, C.; Wong, W. H., Detect and adjust for population stratification in population-based association study using genomic control markersan application of affymetrix genechip \({}^\circledR\) human mapping 10k array, Eur. J. Hum. Genet., 12, 1001-1006, (2004)
[45] Ewens, W. J.; Spielman, R. S., The transmission/disequilibrium testhistory, subdivision and admixture, Am. J. Hum. Genet., 57, 455-464, (1995)
[46] The InternationalHapMap Consortium, A haplotype map of the human genome, Nature 437 (2005) 1299-1320.
[47] Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M. A.R.; Bender, D.; Maller, J.; Sklar, P.; deBakker, P. I.W.; Daly, M. J.; Sham, P. C., Plinka toolset for whole-genome association and population-based linkage analysis, Am. J. Hum. Genet., 81, 559-575, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.