×

Cross-validation methods in principal component analysis: a comparison. (English) Zbl 1145.62344

Summary: In principal components analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods. In this work we compare three CV techniques showing how the performance of these methods depends on the covariance matrix structure. Finally we propose a rule for the choice of the “best” CV method and give an application to real data.

MSC:

62H25 Factor analysis and principal components; correspondence analysis

Software:

AS 127
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bartlett MS (1950) Test of significance in factor analysis.Br. J. Psych. Stat. 3: 77–85
[2] Cattel RB (1966) The Scree test for the number of factors.Mult. Behav. Res. 1: 245–276 · doi:10.1207/s15327906mbr0102_10
[3] Eastment HT, Krzanowski WJ (1982) Cross-validatory choice of the number of component analysis.Technometrics 24: 73–77 · doi:10.2307/1267581
[4] Forina M, Lanteri S, Boggia R, Bertran E (1993) Double cross full validation.Química Analítica 12: 128–135
[5] Heiberger RM (1978) AS 127. Generation of random orthogonal matrices.Applied Statistics 27: 199–206 · Zbl 0433.65006 · doi:10.2307/2346957
[6] Jackson JE (1991) A user’s guide to principal components. Wiley, New York · Zbl 0743.62047
[7] Jeffers JNR (1967) Two case studies in the application of principal components analysis.Applied Statistics 16: 225–236 · doi:10.2307/2985919
[8] Jolliffe IT (1986) Principal component analysis. Springer, Berlin Heidelberg New York
[9] Kaiser HF (1960) The application of electronic computers to factor analysis.Educ. Psychol. Meas. 20: 141–151 · doi:10.1177/001316446002000116
[10] Krzanowski WJ (1983) Cross-validatory choice of the number inPprincipal component analysis; some sampling results.J. Statist. Comput. Simul. 18: 299–314 · doi:10.1080/00949658308810706
[11] Krzanowski WJ (1987) Cross-validation in principal component analysis.Biometrics 43: 575–584 · doi:10.2307/2531996
[12] Krzanowski WJ (1987) Selection of variables to preserve multivariate data structure, using principal components.Applied Statistics 36: 22–33 · doi:10.2307/2347842
[13] Malinowski ER (1977) Theory of error in factor analysis.Analytical Chemistry 49: 606–612 · doi:10.1021/ac50012a026
[14] Minka TP Automatic choice of dimensionality for PCA. Technical Report n. 514 (2000), MIT Media Laboratory, Vision and Modelling Group. http://citeseer.nj.nec.com/minkaooautomaic.html
[15] Scarponi G, Moret I, Capodaglio G, Romanazzi M (1990) Cross-validation, influential observations and selection of variables in chemometric studies of wines by principal component analysis.Journal of Chemometrics 4: 217–240 · doi:10.1002/cem.1180040304
[16] Wold S (1976) Pattern recognition by means of disjoint principal components models.Pattern Recognition 8: 127–139 · Zbl 0336.68040 · doi:10.1016/0031-3203(76)90014-5
[17] Wold S (1978) Cross-validatory estimation of the number of components in factor and principal components models.Technometrics 20: 397–405 · Zbl 0403.62032 · doi:10.2307/1267639
[18] Wold H, Lyttkens E (1969) Nonlinear iterative partial least squares (NIPALS) estimation procedures, Bull. Intern. Statist. Inst.: Proc. 37th Session, pp. 1–15. London · Zbl 0214.46503
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.