×

A robust factor analysis model using the restricted skew-\(t\) distribution. (English) Zbl 1327.62344

Summary: Factor analysis is a classical data-reduction technique that seeks a potentially lower number of unobserved variables that can account for the correlations among the observed variables. This paper presents an extension of the factor analysis model, called the skew-\(t\) factor analysis model, constructed by assuming a restricted version of the multivariate skew-\(t\) distribution for the latent factors and a symmetric \(t\)-distribution for the unobservable errors jointly. The proposed model shows robustness to violations of normality assumptions of the underlying latent factors and provides flexibility in capturing extra skewness as well as heavier tails of the observed data. A computationally feasible expectation conditional maximization algorithm is developed for computing maximum likelihood estimates of model parameters. The usefulness of the proposed methodology is illustrated using both simulated and real data.

MSC:

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis

Software:

EMMIX-skew; sn; PGMM
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Aas, K; Haff, IH, The generalised hyperbolic skew student’s \(t\)-distribution, J Financ Econ, 4, 275-309, (2006)
[2] Aitken, AC, On bernoulli’s numerical solution of algebraic equations, Proc R Soc Edinburgh, 46, 289-305, (1926) · JFM 52.0098.05
[3] Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (Eds.) 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267-281 · Zbl 0283.62006
[4] Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, New York
[5] Azzalini, A, A class of distributions which includes the normal ones, Scand J Stat, 12, 171-178, (1985) · Zbl 0581.62014
[6] Azzalini, A, The skew-normal distribution and related multivariate families, Scand J Stat, 32, 159-188, (2005) · Zbl 1091.62046
[7] Azzalini, A; Capitanio, A, Statistical applications of the multivariate skew normal distribution, J R Stat Soc Ser B, 61, 579-602, (1999) · Zbl 0924.62050
[8] Azzalini, A; Capitanio, A, Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution, J R Stat Soc Ser B, 65, 367-389, (2003) · Zbl 1065.62094
[9] Azzalini, A; Dalla Valle, A, The multivariate skew-normal distribution, Biometrika, 83, 715-726, (1996) · Zbl 0885.62062
[10] Azzalini, A; Genton, MG, Robust likelihood methods based on the skew-\(t\) and related distributions, Int Stat Rev, 76, 106-129, (2008) · Zbl 1206.62102
[11] Barndorff-Nielsen, O; Shephard, N, Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics, J R Stat Soc Ser B, 63, 167-241, (2001) · Zbl 0983.60028
[12] Basilevsky A (2008) Statistical factor analysis and related methods: theory and applications. Wiley, New York · Zbl 1130.62341
[13] Böhning, D; Dietz, E; Schaub, R; Schlattmann, P; Lindsay, B, The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family, Ann Inst Stat Math, 46, 373-388, (1994) · Zbl 0802.62017
[14] Bozdogan, H, Model selection and akaike’s information criterion (AIC): the general theory and its analytical extensions, Psychometrika, 52, 345-370, (1987) · Zbl 0627.62005
[15] Branco, MD; Dey, DK, A general class of multivariate skew-elliptical distributions, J Multivar Anal, 79, 99-113, (2001) · Zbl 0992.62047
[16] Cook RD, Weisberg S (1994) An introduction to regression graphics. Wiley, New York · Zbl 0925.62287
[17] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm (with discussion), J R Stat Soc Ser B, 39, 1-38, (1977) · Zbl 0364.62022
[18] Efron, B; Hinkley, DV, Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information (with discussion), Biometrika, 65, 457-487, (1978) · Zbl 0401.62002
[19] Efron, B; Tibshirani, R, Bootstrap method for standard errors, confidence intervals, and other measures of statistical accuracy, Stat Sci, 1, 54-77, (1986) · Zbl 0587.62082
[20] Fokoué, E; Titterington, DM, Mixtures of factor analyzers. Bayesian estimation and inference by stochastic simulation, Mach Learn, 50, 73-94, (2003) · Zbl 1033.68085
[21] Hannan, EJ; Quinn, BG, The determination of the order of an autoregression, J R Stat Soc Ser B, 41, 190-195, (1979) · Zbl 0408.62076
[22] Healy, MJR, Multivariate normal plotting, Appl Stat, 17, 157-161, (1968)
[23] Ho, HJ; Lin, TI; Chang, HH; Haase, HB; Huang, S; Pyne, S, Parametric modeling of cellular state transitions as measured with flow cytometry different tissues, BMC Bioinform, 13, s5, (2012)
[24] Jamshidian, M; Berkane, M (ed.), An EM algorithm for ML factor analysis with missing data, 247-258, (1997), New York · Zbl 0893.62060
[25] Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice-Hall, Upper Saddle River
[26] Jones, MC; Faddy, MJ, A skew extension of the \(t\)-distribution with applications, J R Stat Soc Ser B, 65, 159-174, (2003) · Zbl 1063.62013
[27] Kotz S, Nadarajah S (2004) Multivariate \(t\) distributions and their applications. Cambridge University Press, Cambridge · Zbl 1100.62059
[28] Lachos, VH; Ghosh, P; Arellano-Valle, RB, Likelihood based inference for skew normal independent linear mixed models, Stat Sin, 20, 303-322, (2010) · Zbl 1186.62071
[29] Lange, KL; Little, RJA; Taylor, JMG, Robust statistical modeling using the \(t\) distribution, J Am Stat Assoc, 84, 881-896, (1989)
[30] Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworth, London
[31] Lee, S; McLachlan, GJ, On mixtures of skew normal and skew \(t\)-distributions, Adv Data Anal Classif, 7, 241-266, (2013) · Zbl 1273.62115
[32] Lee, S; McLachlan, GJ, Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results, Stat Comp, 24, 181-202, (2014) · Zbl 1325.62107
[33] Lee YW, Poon SH (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops, School of Social Science, University of Manchester, pp 1-61 · Zbl 1033.68085
[34] Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. J Multivari Anal 100:2337-2351 · Zbl 1175.62054
[35] Lin, TI; Lee, JC; Ho, HJ, On fast supervised learning for normal mixture models with missing information, Pattern Recog, 39, 1177-1187, (2006) · Zbl 1096.68723
[36] Lin, TI; Lee, JC; Hsieh, WJ, Robust mixture modeling using the skew \(t\) distribution, Stat Compt, 17, 81-92, (2007)
[37] Lin, TI; Lee, JC; Yen, SY, Finite mixture modelling using the skew normal distribution, Stat Sin, 17, 909-927, (2007) · Zbl 1133.62012
[38] Lin, TI; Lin, TC, Robust statistical modelling using the multivariate skew \(t\) distribution with complete and incomplete data, Stat Model, 11, 253-277, (2011) · Zbl 1218.62050
[39] Lin TI, McLachlan GJ, Lee SX (2013) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Preprint arXiv:1307.1748 · Zbl 0992.62047
[40] Lindsay B (1995) Mixture models: theory. Geometry and applications. Institute of Mathematical Statistics, Hayward · Zbl 1163.62326
[41] Liu M, Lin TI (2014) Skew-normal factor analysis models with incomplete data. J Appl Statist. doi:10.1080/02664763.2014.986437 · Zbl 1206.62102
[42] Lopes, HF; West, M, Bayesian model assessment in factor analysis, Stat Sin, 14, 41-67, (2004) · Zbl 1035.62060
[43] Louis, TA, Finding the observed information when using the EM algorithm, J R Stat Soc Ser B, 44, 226-232, (1982) · Zbl 0488.62018
[44] McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York · Zbl 0963.62061
[45] McLachlan, GJ; Bean, RW; Jones, LBT, Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution, Comput Stat Data Anal, 51, 5327-5338, (2007) · Zbl 1445.62053
[46] McNicholas, PD; Murphy, TB; McDaid, AF; Frost, D, Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models, Comput Stat Data Anal, 54, 711-723, (2010) · Zbl 1464.62131
[47] Meng, XL; Rubin, DB, Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278, (1993) · Zbl 0778.62022
[48] Montanari, A; Viroli, C, Heteroscedastic factor mixture analysis, Stat Model, 10, 441-460, (2010)
[49] Murray PM, Browne RP, McNicholas PD (2013) Mixtures of ‘unrestricted’ skew-\(t\) factor analyzers. Preprint arXiv:1310.6224v1
[50] Murray, PM; Browne, RP; McNicholas, PD, Mixtures of skew-\(t\) factor analyzers, Comput Stat Data Anal, 77, 326-335, (2014) · Zbl 06984029
[51] Murray, PM; McNicholas, PD; Browne, RP, Mixtures of common skew-\(t\) factor analyzers, Stat, 3, 68-82, (2014)
[52] Pyne, S; Hu, X; Wang, K; Rossin, E; Lin, TI; Maier, LM; Baecher-Allan, C; McLachlan, GJ; Tamayo, P; Hafler, DA; Jager, PL; Mesirov, JP, Automated high-dimensional flow cytometric data analysis, Proc Natl Acad Sci USA, 106, 8519-8524, (2009)
[53] Rossin, E; Lin, TI; Ho, HJ; Mentzer, SJ; Pyne, S, A framework for analytical characterization of monoclonal antibodies based on reactivity profiles in different tissues, Bioinformatics, 27, 2746-2753, (2011)
[54] Sahu, SK; Dey, DK; Branco, MD, A new class of multivariate skew distributions with application to Bayesian regression models, Can J Stat, 31, 129-150, (2003) · Zbl 1039.62047
[55] Schwarz, G, Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[56] Sclove LS (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52:333-343 · Zbl 0885.62062
[57] Spearman, C, General intelligence, objectively determined and measured, Am J Psychol, 15, 201-292, (1904)
[58] Tortora C, McNicholas PD, Browne R (2013) A mixture of generalized hyperbolic factor analyzers. Preprint arXiv: 1311.6530v1 · Zbl 1445.62053
[59] Wall, MM; Guo, J; Amemiya, Y, Mixture factor analysis for approximating a non-normally distributed continuous latent factor with continuous and dichotomous observed variables, Multivar Behav Res, 47, 276-313, (2012)
[60] Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12
[61] Wang, WL; Lin, TI, An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers, Comput Stat, 28, 751-769, (2013) · Zbl 1305.65082
[62] Zacks S (1971) The theory of statistical inference. Wiley, New York
[63] Zhang J, Li J, Liu C (2013) Robust factor analysis using the multivariate \(t\)-distribution. unpublished manuscript · Zbl 0802.62017
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.