×

zbMATH — the first resource for mathematics

Using regression makes extraction of shared variation in multiple datasets easy. (English) Zbl 1416.62316
Summary: In many data analysis tasks it is important to understand the relationships between different datasets. Several methods exist for this task but many of them are limited to two datasets and linear relationships. In this paper, we propose a new efficient algorithm, termed cocoreg, for the extraction of variation common to all datasets in a given collection of arbitrary size. cocoreg extends redundancy analysis to more than two datasets, utilizing chains of regression functions to extract the shared variation in the original data space. The algorithm can be used with any linear or non-linear regression function, which makes it robust, straightforward, fast, and easy to implement and use. We empirically demonstrate the efficacy of shared variation extraction using the cocoreg algorithm on five artificial and three real datasets.
MSC:
62H20 Measures of association (correlation, canonical correlation, etc.)
62J15 Paired and multiple comparisons; multiple testing
62H25 Factor analysis and principal components; correspondence analysis
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, vol 28, pp 1247-1255
[2] Dähne, S.; Nikulin, VV; Ramírez, D.; Schreier, PJ; Müller, KR; Haufe, S., Finding brain oscillations with power dependencies in neuroimaging data, NeuroImage, 96, 334-348, (2014)
[3] Damianou A, Ek C, Titsias MK, Lawrence ND (2012) Manifold relevance determination. In: Proceedings of the 29th international conference on machine learning, pp 145-152
[4] Fisher, J.; Darrell, T., Speaker association with signal-level audiovisual fusion, IEEE Trans Multimed, 6, 406-413, (2003)
[5] Hardoon, D.; Szedmak, S.; Shawe-Taylor, J., Canonical correlation analysis: an overview with application to learning methods, Neural Comput, 16, 2639-2664, (2004) · Zbl 1062.68134
[6] Hasson, U.; Nir, Y.; Levy, I.; Fuhrmann, G.; Malach, R., Intersubject synchronization of cortical activity during natural vision, Science, 303, 1634-1640, (2004)
[7] Hastie T, Tibshirani R, Friedman J (2003) The elements of statistical learning: data mining, inference, and prediction. Springer, New York · Zbl 1273.62005
[8] Hotelling, H., Relations between two sets of variates, Biometrika, 28, 321-377, (1936) · Zbl 0015.40705
[9] Hsieh, WW, Nonlinear canonical correlation analysis by neural networks, Neural Netw, 13, 1095-1105, (2000)
[10] Hwang, H.; Jung, K.; Takane, Y.; Woodward, TS, A unified approach to multiple-set canonical correlation analysis and principal components analysis, Br J Math Stat Psychol, 66, 308-321, (2013) · Zbl 1410.62094
[11] Kettenring, J., Canonical analysis of several sets of variables, Biometrika, 58, 433-451, (1971) · Zbl 0225.62072
[12] Klami, A.; Virtanen, S.; Kaski, S., Bayesian canonical correlation analysis, J Mach Learn Res, 14, 965-1003, (2013) · Zbl 1320.62134
[13] Klami, A.; Virtanen, S.; Leppäho, E., Group factor analysis, IEEE Trans Neural Netw Learn Syst, 26, 2136-2147, (2015)
[14] Korpela J, Henelius A (2016) Cocoreg: extracts shared variation in collections of datasets using regression models. http://cran.r-project.org/package=cocoreg
[15] Legendre P, Legendre L (1998) Numerical ecology, 2nd edn. Elsevier, Amsterdam · Zbl 1033.92036
[16] Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18-22. https://cran.r-project.org/package=randomForest
[17] Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2014) e1071: misc functions of the department of statistics (e1071), Technische Universität Wien. http://cran.r-project.org/package=e1071
[18] Müller, KE, Understanding canonical correlation through the general linear model and principal components, Am Stat, 36, 342-354, (1982) · Zbl 0533.62053
[19] Nguyen HV, Müller E, Vreeken J, Efros P, Böhm K (2014) Multivariate maximal correlation analysis. In: Proceedings of the 31st international conference on machine learning, pp 775-783
[20] R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, http://www.R-project.org/
[21] Tenenhaus, A., Regularized generalized canonical correlation analysis and PLS path modeling, Psychometrika, 76, 257-284, (2011) · Zbl 1284.62753
[22] Tibshirani, R., Regression shrinkage and selection via the lasso, J R Stat Soc Ser B, 58, 267-288, (1996) · Zbl 0850.62538
[23] Timmerman, ME; Kiers, H., Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences, Psychometrika, 68, 105-121, (2003) · Zbl 1306.62507
[24] Virtanen S, Klami A, Khan SA, Kaski S (2012) CCAGFA: Bayesian canonical correlation analysis and group factor analysis. http://cran.r-project.org/package=CCAGFA
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.