×

Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. (English) Zbl 1402.62121

Summary: A new framework for sequential multiblock component methods is presented. This framework relies on a new version of regularized generalized canonical correlation analysis (RGCCA) where various scheme functions and shrinkage constants are considered. Two types of between block connections are considered: blocks are either fully connected or connected to the superblock (concatenation of all blocks). The proposed iterative algorithm is monotone convergent and guarantees obtaining at convergence a stationary point of RGCCA. In some cases, the solution of RGCCA is the first eigenvalue/eigenvector of a certain matrix. For the scheme functions \(x\), \(| x |\), \(x^2\) or \(x^4\) and shrinkage constants \(0\) or \(1\), many multiblock component methods are recovered.

MSC:

62H20 Measures of association (correlation, canonical correlation, etc.)
62H25 Factor analysis and principal components; correspondence analysis
65C60 Computational problems in statistics (MSC2010)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Addinsoft (2016). XLSTAT software, Paris. · Zbl 0354.92050
[2] Carroll, J. D. (1968a). A generalization of canonical correlation analysis to three or more sets of variables. Proceedings of the 76th Convention - American Psychological Association, pp. 227-228. · Zbl 1445.62131
[3] Carroll, J. D. (1968b). Equations and Tables for a generalization of canonical correlation analysis to three or more sets of variables. Unpublished companion paper to Carroll J.D.
[4] Chessel, D., & Hanafi, M. (1996). Analyses de la co-inertie de \[KK\] nuages de points. Revue de Statistique Appliquée, 44, 35-60.
[5] Dahl, T., & Næs, T. (2006). A bridge between Tucker-1 and Carroll’s generalized canonical analysis. Computational Statistics and Data Analysis, 50, 3086-3098. · Zbl 1445.62131 · doi:10.1016/j.csda.2005.06.016
[6] Dijkstra T. K. (1981). Latent variables in linear stochastic models, PhD thesis. Amsterdam: Sociometric Research Foundation. · Zbl 0043.34203
[7] Dijkstra, T. K. (1983). Some comments on maximum likelihood and partial least squares methods. Journal of Economics, 22, 67-90. · Zbl 0521.62098 · doi:10.1016/0304-4076(83)90094-5
[8] Dijkstra, T. K., & Henseler, J. (2015). Consistent and asymptotically normal PLS estimators for linear structural equations. Computational Statistics and Data Analysis, 81, 10-23. · Zbl 1507.62047 · doi:10.1016/j.csda.2014.07.008
[9] Escofier, B., & Pagès, J. (1994). Multiple factor analysis, (AFMULT package). Computational Statistics and Data Analysis, 18, 121-140. · Zbl 0825.62517 · doi:10.1016/0167-9473(94)90135-X
[10] Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272-299. · doi:10.1037/1082-989X.4.3.272
[11] Fessler J. (2004). Monotone convergence. Lecture notes. https://web.eecs.umich.edu/ fessler/course/600/l/lmono.pdf. · Zbl 1157.62422
[12] Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2014). A primer on partial least squares structural equation modeling (PLS-SEM). Thousand Oaks, CA: SAGE. · Zbl 1291.62010
[13] Hanafi, M. (2007). PLS path modelling: Computation of latent variables with the estimation mode B. Computational Statistics, 22, 275-292. · Zbl 1196.62103 · doi:10.1007/s00180-007-0042-3
[14] Hanafi, M., & Kiers, H. A. L. (2006). Analysis of \[KK\] sets of data, with differential emphasis on agreement between and within sets. Computational Statistics and Data Analysis, 51, 1491-1508. · Zbl 1157.62422 · doi:10.1016/j.csda.2006.04.020
[15] Hanafi, M., Kohler, A., & Qannari, E. M. (2010). Shedding new light on hierarchical principal component analysis. Journal of Chemometrics, 24, 703-709. · doi:10.1002/cem.1334
[16] Hanafi, M., Kohler, A., & Qannari, E. M. (2011). Connections between multiple co-inertia analysis and consensus principal component analysis. Chemometrics and Intelligent Laboratory Systems, 106, 37-40. · doi:10.1016/j.chemolab.2010.05.010
[17] Hassani, S., Hanafi, M., Qannari, E. M., & Kohler, A. (2013). Deflation strategies for multi-block principal component analysis revisited. Chemometrics and Intelligent Laboratory Systems, 120, 154-168. · doi:10.1016/j.chemolab.2012.08.011
[18] Horst, P. (1961a). Relations among \[m\] m sets of measures. Psychometrika, 26, 126-149. · Zbl 0099.35801 · doi:10.1007/BF02289710
[19] Horst, P. (1961b). Generalized canonical correlations and their applications to experimental data. Journal of Clinical Psychology (Monograph supplement), 14, 331-347. · doi:10.1002/1097-4679(196110)17:4<331::AID-JCLP2270170402>3.0.CO;2-D
[20] Horst, P. (1965). Factor analysis of data matrices. New York: Holt, Rinehart and Winston. · Zbl 0136.39204
[21] Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321-377. · Zbl 0015.40705 · doi:10.1093/biomet/28.3-4.321
[22] Hwang, H., & Takane, Y. (2014). Generalized structured component analysis: A component-based approach to structural equation modeling. Boca Raton: CRC Press. · Zbl 1341.62033
[23] Jöreskog, KG; Wold, H.; Jöreskog, KG (ed.); Wold, H. (ed.), The ML and PLS techniques for modeling with latent variables, historical and comparative aspects, 263-270 (1982), Amsterdam · Zbl 0503.62067
[24] Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11, 517-553. · Zbl 1242.62048
[25] Kettenring J. R. (1969). Canonical analysis of several sets of variables. Unpublished Ph. D. thesis, Institute of Statistics Mimeo Series No. 615, University of North Carolina at Chapel Hill. · Zbl 0225.62072
[26] Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58, 433-451. · Zbl 0225.62072 · doi:10.1093/biomet/58.3.433
[27] Krämer, N. (2007). Analysis of high-dimensional data with partial least squares and boosting. Doctoral dissertation. Technischen Universität Berlin. · Zbl 1429.62227
[28] Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365-411. · Zbl 1032.62050 · doi:10.1016/S0047-259X(03)00096-4
[29] Lohmöller, J.-B. (1989). Latent variables path modeling with partial least squares. Heildelberg: Springer (reprinted 2013). · Zbl 0337.65037
[30] McDonald, R. P. (1968). A unified treatment of the weighting problem. Psychometrika, 33, 351-381. · Zbl 0255.62052 · doi:10.1007/BF02289330
[31] McDonald, R. P. (1996). Path analysis with composite variables. Multivariate Behavioral Research, 31, 239-270. · doi:10.1207/s15327906mbr3102_5
[32] McKeon J. J. (1966). Canonical analysis: Some relation between canonical correlation, factor analysis, discriminant analysis, and scaling theory. Psychometric Monograph, 13.
[33] Meyer, R. R. (1976). Sufficient conditions for the convergence of monotonic mathematical programming algorithms. Journal of Computer and System Sciences, 12(1), 108-121. · Zbl 0337.65037 · doi:10.1016/S0022-0000(76)80021-9
[34] Ringle, C. M., Wende, S., & Becker, J.-M. (2015). SmartPLS 3. Bönningstedt: SmartPLS GmbH.
[35] Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1), Article 32.
[36] Smilde, A. K., Westerhuis, J. A., & de Jong, S. (2003). A framework for sequential multiblock component methods. Journal of Chemometrics, 17, 323-337. · doi:10.1002/cem.811
[37] Steel, R. G. D. (1951). Minimum generalized variance for a set of linear functions. Annals of Mathematical Statistics, 22, 456-460. · Zbl 0043.34203 · doi:10.1214/aoms/1177729594
[38] Ten Berge, J. M. F. (1988). Generalized approaches to the MAXBET problem and the MAXDIFF problem, with applications to canonical correlations. Psychometrika, 53, 487-494. · Zbl 0726.62086 · doi:10.1007/BF02294402
[39] Tenenhaus, M. (2008). Component-based structural equation modelling. Total Quality Management & Business Excellence, 19(7), 871-886. · doi:10.1080/14783360802159543
[40] Tenenhaus, A., & Guillemot, V. (2017). RGCCA: Regularized and sparse generalized canonical correlation analysis for multiblock data. http://cran.project.org/web/packages/RGCCA/index.html.
[41] Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76, 257-284. · Zbl 1284.62753 · doi:10.1007/s11336-011-9206-8
[42] Tenenhaus, A., & Tenenhaus, M. (2014). Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. European Journal of Operational Research, 238, 391-403. · Zbl 1341.62160 · doi:10.1016/j.ejor.2014.01.008
[43] Tenenhaus, M., Esposito, Vinzi V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48, 159-205. · Zbl 1429.62227 · doi:10.1016/j.csda.2004.03.005
[44] Tucker, L. R. (1958). An inter-battery method of factor analysis. Psychometrika, 23, 111-136. · Zbl 0097.35102 · doi:10.1007/BF02289009
[45] Van de Geer, J. P. (1984). Linear relations among \[k\] k sets of variables. Psychometrika, 49, 70-94.
[46] Van den Wollenberg, A. L. (1977). Redundancy analysis—An alternative to canonical correlation analysis. Psychometrika, 42, 207-219. · Zbl 0354.92050 · doi:10.1007/BF02294050
[47] Wangen, L. E., & Kowalski, B. R. (1989). A multiblock partial least squares algorithm for investigating complex chemical systems. Journal of Chemometrics, 3, 3-20. · doi:10.1002/cem.1180030104
[48] Westerhuis, J. A., Kourti, T., & MacGregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301-321. · doi:10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S
[49] Widaman, K. F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters? Multivariate Behavioral Research, 28(3), 263-311. · doi:10.1207/s15327906mbr2803_1
[50] Wold, H.; David, FN (ed.), Nonlinear estimation by iterative least square procedures, 411-444 (1966), London · Zbl 0161.15901
[51] Wold, H.; Jöreskog, KG (ed.); Wold, H. (ed.), Soft modeling: The basic design and some extensions, 1-54 (1982), Amsterdam · Zbl 0517.62065
[52] Wold, H.; Kotz, S. (ed.); Johnson, NL (ed.), Partial least squares, No. 6, 581-591 (1985), New York
[53] Wold, S., Hellberg, S., Lundstedt, T., Sjöström, M., & Wold, H. (1987): PLS modeling with latent variables in two or more dimensions. In Proceedings of the symposium on PLS model building: Theory and application pp. 1-21, Frankfurt am Main. · Zbl 1429.62227
[54] Wold, S., Kettaneh, N., & Tjessem, K. (1996). Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. Journal of Chemometrics, 10, 463-482. · doi:10.1002/(SICI)1099-128X(199609)10:5/6<463::AID-CEM445>3.0.CO;2-L
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.