×

In the pursuit of sparseness: a new rank-preserving penalty for a finite mixture of factor analyzers. (English) Zbl 07422736

Summary: A finite mixture of factor analyzers is an effective method for achieving parsimony in model-based clustering. Introducing a penalization term for the factor loading can lead to sparse estimates. However, in the pursuit of sparseness, one can end up with rank-deficient solutions regardless of the number of factors assumed. In light of this issue, a new penalty-based method that can fit a finite mixture of sparse factor analyzers with full-rank factor loading estimates is developed. In addition, the extension of an existing penalized factor analyzer model to a finite mixture is introduced.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adachi, Kohei; Trendafilov, Nickolay T., Sparse orthogonal factor analysis, (Advances in Latent Variables (2014), Springer), 227-239 · Zbl 1436.62220
[2] Adachi, Kohei; Trendafilov, Nickolay T., Sparsest factor analysis for clustering variables: a matrix decomposition approach, Adv. Data Anal. Classif., 12, 3, 559-585 (9 2018) · Zbl 1416.62319
[3] Aitken, A. C., On bernoulli’s numerical solution of algebraic equations, Proc. R. Soc. Edinb., 46, 289-305 (1926) · JFM 52.0098.05
[4] Andrews, Jeffrey L.; McNicholas, Paul D., Extending mixtures of multivariate t-factor analyzers, Stat. Comput., 21, 3, 361-373 (2011) · Zbl 1255.62175
[5] Andrews, Jeffrey L.; Wickins, Jaymeson R.; Boers, Nicholas M.; McNicholas, Paul D., teigen: an R package for model-based clustering and classification via the multivariate t distribution, J. Stat. Softw., 83, 7, 1-32 (2018)
[6] Baek, Jangsun; McLachlan, Geoffrey J.; Flack, Lloyd K., Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data, IEEE Trans. Pattern Anal. Mach. Intell., 32, 7, 1298-1309 (2009)
[7] Bergé, Laurent; Bouveyron, Charles; Girard, Stéphane, HDclassif: an R package for model-based clustering and discriminant analysis of high-dimensional data, J. Stat. Softw., 46, 6, 1-29 (2012)
[8] Bouveyron, Charles; Girard, Stéphane; Schmid, Cordelia, High-dimensional data clustering, Comput. Stat. Data Anal., 52, 1, 502-519 (2007) · Zbl 1452.62433
[9] Browne, R. P.; McNicholas, P. D., A mixture of generalized hyperbolic distributions, Can. J. Stat., 43, 2, 176-198 (2015) · Zbl 1320.62144
[10] Browne, Ryan P.; Mcnicholas, Paul D., Estimating common principal components in high dimensions, Adv. Data Anal. Classif., 8, 2, 217-226 (2014) · Zbl 1474.62183
[11] Browne, Ryan P.; Mcnicholas, Paul D., Orthogonal stiefel manifold optimization for eigen-decomposed covariance parameter estimation in mixture models, Stat. Comput., 24, 2, 203-210 (2014) · Zbl 1325.62008
[12] Dang, Utkarsh J.; Punzo, Antonio; McNicholas, Paul D.; Ingrassia, Salvatore; Browne, Ryan P., Multivariate response and parsimony for gaussian cluster-weighted models, J. Classif., 34, 1, 4-34 (2017) · Zbl 1364.62149
[13] de Leeuw, Jan; Lange, Kenneth, Sharp quadratic majorization in one dimension, Comput. Stat. Data Anal., 53, 7, 2471-2484 (2009) · Zbl 1453.62078
[14] Epskamp, Sacha; Cramer, Angélique O. J.; Waldorp, Lourens J.; Schmittmann, Verena D.; Borsboom, Denny, qgraph: network visualizations of relationships in psychometric data, J. Stat. Softw., 48, 4, 1-18 (2012)
[15] Fokoué, Ernest; Titterington, D. M., Mixtures of factor analysers. bayesian estimation and inference by stochastic simulation, Mach. Learn., 50, 1-2, 73-94 (2003) · Zbl 1033.68085
[16] Franczak, Brian C., Browne, Ryan P., McNicholas, Paul D., Burak, Katherine L., 2018. MixSAL: Mixtures of Multivariate Shifted Asymmetric Laplace (SAL) Distributions. R package version 1.0.
[17] Franczak, Brian C.; McNicholas, Paul D.; Browne, Ryan P.; Murray, Paula M., Parsimonious shifted asymmetric laplace mixtures (2013), preprint
[18] Hirose, Kei; Yamamoto, Michio, Estimation of an oblique structure via penalized likelihood factor analysis, Comput. Stat. Data Anal., 79, 120-132 (2014) · Zbl 1506.62080
[19] Hirose, Kei; Yamamoto, Michio, Sparse estimation via nonconcave penalized likelihood in factor analysis model, Stat. Comput., 25, 5, 863-875 (2015) · Zbl 1332.62194
[20] Hoerl, Arthur E.; Kennard, Robert W., Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 1, 55-67 (1970) · Zbl 0202.17205
[21] Horn, Roger A.; Johnson, Charles R., Matrix Analysis (2012), Cambridge University Press · Zbl 0576.15001
[22] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 1, 193-218 (1985)
[23] Hunter, David R.; Lange, Kenneth, Quantile regression via an mm algorithm, J. Comput. Graph. Stat., 9, 1, 60-77 (2000)
[24] Kaggle, Movehub city rankings (2017)
[25] Kiers, Henk A. L., Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems, Comput. Stat. Data Anal., 41, 1, 157-170 (2002) · Zbl 1018.65074
[26] Kim, Nam-Hwui; Browne, Ryan, Subspace clustering for the finite mixture of generalized hyperbolic distributions, Adv. Data Anal. Classif., 13, 3, 641-661 (2019) · Zbl 1474.62187
[27] Lin, Tsung-I; McNicholas, Paul D.; Ho, Hsiu J., Capturing patterns via parsimonious t mixture models, Stat. Probab. Lett., 88, 80-87 (2014) · Zbl 1369.62131
[28] Maleki, Mohsen; Wraith, Darren, Mixtures of multivariate restricted skew-normal factor analyzer models in a bayesian framework, Comput. Stat., 34, 3, 1039-1053 (2019) · Zbl 1505.62270
[29] McLachlan, Geoffrey; Peel, David, Finite Mixture Models (2004), John Wiley & Sons · Zbl 0963.62061
[30] McNicholas, P. D., Mixture Model-Based Classification (2015), CRC Press, Taylor & Francis Group: CRC Press, Taylor & Francis Group Boca Raton
[31] McNicholas, Paul D., ElSherbiny, Aisha, McDaid, Aaron F., Murphy, T. Brendan, 2018. pgmm: Parsimonious Gaussian Mixture Models. R package version 1.2.3.
[32] McNicholas, Paul David; Murphy, Thomas Brendan, Parsimonious gaussian mixture models, Stat. Comput., 18, 3, 285-296 (2008)
[33] McNicholas, Sharon M.; McNicholas, Paul D.; Browne, Ryan P., A mixture of variance-gamma factor analyzers, (Big and Complex Data Analysis (2017), Springer), 369-385 · Zbl 1381.62187
[34] Meng, Xiao-Li; Van Dyk, David, The em algorithm—an old folk-song sung to a fast new tune, J. R. Stat. Soc., Ser. B, Stat. Methodol., 59, 3, 511-567 (1997) · Zbl 1090.62518
[35] Movehub, Movehub city rankings (2019)
[36] Murray, Paula M.; Browne, Ryan P.; McNicholas, Paul D., Mixtures of skew-t factor analyzers, Comput. Stat. Data Anal., 77, 326-335 (2014) · Zbl 1506.62132
[37] Neuhaus, Jack O.; Wrigley, Charles, The quartimax method: an analytic approach to orthogonal simple structure 1, Br. J. Stat. Psychol., 7, 2, 81-91 (1954)
[38] Nie, F.; Huang, H.; Cai, X.; Ding, C., Efficient and robust feature selection via joint \(l_{2 , 1}\)-norms minimization, Proc. Adv. Neural Inf. Process. Syst., 1813-1821 (2010)
[39] Pan, Wei; Shen, Xiaotong, Penalized model-based clustering with application to variable selection, J. Mach. Learn. Res., 8, 1145-1164 (May 2007) · Zbl 1222.68279
[40] Ramirez, C.; Sanchez, R.; Kreinovich, V.; Argaez, M., \( \sqrt{ x^2 + \mu}\) is the most computationally efficient smooth approximation to \(| x |\): a proof, J. Uncertain Syst., 8, 3, 205-210 (2014)
[41] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 2, 461-464 (1978) · Zbl 0379.62005
[42] Scrucca, Luca; Fop, Michael; Murphy, Thomas Brendan; Raftery, Adrian E., mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J., 8, 1, 205-233 (2016)
[43] Städler, Nicolas; Bühlmann, Peter; Van De Geer, Sara, l1-penalization for mixture regression models, Test, 19, 2, 209-256 (2010) · Zbl 1203.62128
[44] Tibshirani, Robert, Regression shrinkage and selection via the lasso, J. R. Stat. Soc., Ser. B, Methodol., 58, 1, 267-288 (1996) · Zbl 0850.62538
[45] Tortora, Cristina; McNicholas, Paul D.; Browne, Ryan P., A mixture of generalized hyperbolic factor analyzers, Adv. Data Anal. Classif., 10, 4, 423-440 (2016) · Zbl 1414.62278
[46] Trendafilov, Nickolay T.; Adachi, Kohei, Sparse versus simple structure loadings, Psychometrika, 80, 3 (2015) · Zbl 1323.62124
[47] Trendafilov, Nickolay T.; Fontanella, Sara; Adachi, Kohei, Sparse exploratory factor analysis, Psychometrika, 82, 3, 778-794 (9 2017) · Zbl 1402.62129
[48] Xie, Benhuai; Pan, Wei; Shen, Xiaotong, Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables, Electron. J. Stat., 2, 168 (2008) · Zbl 1135.62055
[49] Yuan, Ming; Lin, Yi, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc., Ser. B, Stat. Methodol., 68, 1, 49-67 (2006) · Zbl 1141.62030
[50] Zou, Hui; Hastie, Trevor, Regularization and variable selection via the elastic net, J. R. Stat. Soc., Ser. B, Stat. Methodol., 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.