## How many principal components? Stopping rules for determining the number of non-trivial axes revisited.(English)Zbl 1429.62223

Summary: Principal component analysis is one of the most widely applied tools in order to summarize common patterns of variation among variables. Several studies have investigated the ability of individual methods, or compared the performance of a number of methods, in determining the number of components describing common variance of simulated data sets. We identify a number of shortcomings related to these studies and conduct an extensive simulation study where we compare a larger number of rules available and develop some new methods. In total we compare 20 stopping rules and propose a two-step approach that appears to be highly effective. First, a Bartlett’s test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set. If significant, a number of different rules can be applied to estimate the number of non-trivial components to be retained. However, the relative merits of these methods depend on whether data contain strongly correlated or uncorrelated variables. We also estimate the number of non-trivial components for a number of field data sets so that we can evaluate the applicability of our conclusions based on simulated data.

### MSC:

 62H25 Factor analysis and principal components; correspondence analysis

Canoco; sedaR
Full Text:

### References:

 [1] Anderson, M.J.; Legendre, P., An empirical comparison of permutation methods for tests of partial regression coefficient in a linear model, J. statist. comput simulation, 62, 271-303, (1999) · Zbl 1055.62525 [2] Anderson, T.W., An introduction to multivariate statistical analysis, (1984), Wiley New York · Zbl 0651.62041 [3] Barr, D.R.; Slezak, N.L., Comparison of multivariate normal generators, Comm. assoc. comput. Mach., 15, 1048-1972, (1972) · Zbl 0244.65006 [4] Bartlett, M.S., Tests of significance in factor analysis, British J. psych. (statistical section), 3, 77-85, (1950) [5] Bartlett, M.S., A note on the multiplying factors for various $$X^2$$ approximations, J. roy. statist. soc. ser B, 16, 296-298, (1954) · Zbl 0057.35404 [6] Blondel, J.; Vuilleumier, F.; Marcus, L.F.; Terouanne, E., Is there ecomorphological convergence among Mediterranean bird communities of Chile, California, and France?, Evol. biol., 18, 141-213, (1984) [7] Buja, A.; Eyuboglu, N., Remarks on parallel analysis, Mult. beh. res., 27, 509-540, (1992) [8] Crawford, C.B., Determining the number of interpretable factors, Psych. bul., 82, 226-237, (1975) [9] Efron, B., Bootstrap methodsanother look at the jackknife, Ann. statist., 7, 1-26, (1979) · Zbl 0406.62024 [10] Fava, J.L.; Velicer, W.F., The effects of overextraction on factor and component analysis, Mult. beh. res., 27, 387-415, (1992) [11] Férre, L., Selection of components in principal component analysisa comparison of methods, Comput. statist. data anal., 19, 669-682, (1995) · Zbl 0875.62253 [12] Franklin, S.B.; Gibson, D.J.; Robertson, P.A.; Pohlmann, J.T.; Fralish, J.S., Parallel analysisa method for determining significant principal components, J. veg. sci., 6, 99-106, (1995) [13] Frontier, S., Étude de la decroissance des valeurs propers dans une analyze en composantes principalescomparison avec le modèle de baton brisé, J. exp. mar. biol. ecol., 25, 67-75, (1976) [14] Gauch, H.G., Noise reduction by eigenvector ordination, Ecology, 63, 1643-1649, (1982) [15] Grossman, G.D.; Nickerson, D.M.; Freeman, M.C., Principal component analysis of assemblage structure datautility of tests based on eigenvalues, Ecology, 72, 341-347, (1991) [16] Guttman, L., Some necessary conditions for common factor analysis, Psychometrika, 19, 149-161, (1954) · Zbl 0058.13004 [17] Horn, J.L., A rationale and test for the number of factors in factor analysis, Psychometrika, 30, 179-185, (1965) · Zbl 1367.62186 [18] Jackson, D.A., Stopping rules in principal component analysisa comparison of heuristical and statistical approaches, Ecology, 74, 2204-2214, (1993) [19] Jackson, D.A., Bootstrapped principal component analysis-reply to mehlman et al, Ecology, 76, 644-645, (1995) [20] Jackson, J.E., A User’s guide to principal components, (1991), Wiley New York · Zbl 0743.62047 [21] Jolliffe, I.T., Principal component analysis, (2002), Springer New York · Zbl 1011.62064 [22] Karr, R.J., Martin, T.E., 1981. Random numbers and principal components: further searches for the unicorn. In: Capen, D.E. (Ed.), The Use of Multivariate Statistics in Studies of Wildlife Habitat. United States Forest Service General Technical Report RM-87. [23] Knox, R.G.; Peet, R.K., Bootstrapped ordinationa method for estimating sampling effects in indirect gradient analysis, Vegetation, 80, 153-165, (1989) [24] Lambert, Z.V.; Wildt, A.R.; Durand, R.M., Assessing sampling variation relative to number-of-factors criteria, Ed. psych. meas., 50, 33-49, (1990) [25] Lawley, D.N., Tests of the significance for the latent roots of covariance and correlation matrices, Biometrika, 43, 128-136, (1956) · Zbl 0070.37603 [26] Lawrence, F.R.; Hancock, G.R., Conditions affecting integrity of a factor solution under varying degrees of overextraction, Ed. psych. meas., 59, 549-579, (1999) [27] Legendre, P., Comparison of permutation methods for the partial correlation and partial mantel tests, J. statist. comput. simulation, 67, 37-73, (2000) · Zbl 1146.62355 [28] Legendre, P.; Legendre, L., Numerical ecology, (1998), Elsevier Science BV Amsterdam · Zbl 1033.92036 [29] Losos, J.B., Ecomorphology, performance capability, and scaling of west Indian anolis lizardsan evolutionary analysis, Ecol. monog., 60, 368-388, (1990) [30] Manly, B.J.F., Randomization, bootstrap and Monte Carlo methods in biology, (1997), Chapman and Hall London · Zbl 0918.62081 [31] Mehlman, D.W.; Shepherd, U.L.; Kelt, D.A., Bootstrapping principal component analysis—a comment, Ecology, 76, 640-643, (1995) [32] Milan, L.; Whittaker, J., Application of the parametric bootstrap to models that incorporate a singular value decomposition, Appl. statist., 44, 31-49, (1995) · Zbl 0821.62030 [33] Peres-Neto, P.R., How many statistical tests are too many? the problem of conducting multiple ecological inferences revisited, Mar. ecol. prog. ser., 176, 303-306, (1999) [34] Peres-Neto, P.R.; Jackson, D.A., How well do multivariate data sets match? the robustness and flexibility of a Procrustean superimposition approach over the mantel test, Oecologia, 129, 169-178, (2001) [35] Peres-Neto, P.R.; Olden, J.D., Assessing the robustness of randomization testsexamples from behavioural studies, An. beh., 61, 79-86, (2001) [36] Peres-Neto, P.R.; Jackson, D.A.; Somers, K.M., Giving meaningful interpretation to ordination axesassessing loading significance in principal component analysis, Ecology, 84, 2347-2363, (2003) [37] Pimentel, R.A., Morphometricsthe multivariate analysis of biological data, (1979), Kendall-Hunt Dubuque [38] Reddon, J.R., Monte Carlo type one error rates for Velicer’s partial correlation test for the number of principal components, Criminometrica, 1, 13-23, (1985) [39] Robinson, C.L.K.; Tonn, W.M., Influence of environmental factors and piscivory in structuring fish assemblages of small Alberta lakes, Canad. J. fish. aqua. sci., 46, 81-89, (1989) [40] Seber, G.A.F., Multivariate observations, (1984), Wiley New York · Zbl 0627.62052 [41] Stauffer, D.F.; Garton, E.O.; Steinhorst, R.K., A comparison of principal components from real and random data, Ecology, 66, 1693-1698, (1985) [42] ter Braak, C.F.J., 1988. CANOCO—a Fortran program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal component analysis and redundancy analysis (version 2.1), Agricultural Mathematic Group, Report LWA-88-02, Wageningen. [43] ter Braak, C.F.J., 1990. Update notes: CANOCO (version 3.1). Agricultural Mathematic Group, Wageningen. [44] Velicer, W.F., Determining the number of components from the matrix of partial correlations, Psychometrika, 41, 321-327, (1976) · Zbl 0336.62041 [45] Zwick, R.W.; Velicer, W.F., Comparison of five rules for determining the number of components to retain, Psych. bull., 99, 432-442, (1986)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.