Csala, Attila; Hof, Michel H.; Zwinderman, Aeilko H. Multiset sparse redundancy analysis for high-dimensional omics data. (English) Zbl 1419.62329 Biom. J. 61, No. 2, 406-423 (2019). Summary: Redundancy Analysis (RDA) is a well-known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high-dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi-sRDA) framework is a prominent candidate for high-dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi-sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome. MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 62H12 Estimation in multivariate analysis Keywords:high-dimensional data; multivariate statistics; omics data; redundancy analysis Software:R; vegan PDFBibTeX XMLCite \textit{A. Csala} et al., Biom. J. 61, No. 2, 406--423 (2019; Zbl 1419.62329) Full Text: DOI References: [1] Buescher, J. M., & Driggers, E. M. ( 2016). Integration of omics: More than the sum of its parts. Cancer & Metabolism, 4( 1), 4. [2] Crick, F. ( 1970). Central dogma of molecular biology. Nature, 227( 5258), 561– 563. [3] Csala, A., Voorbraak, F. P. J. M., Zwinderman, A. H., & Hof, M. H. ( 2017). Sparse redundancy analysis of high‐dimensional genetic and genomic data. Bioinformatics, 33( 20), 3228– 3234. [4] Esposito Vinzi, V., & Russolillo, G. ( 2013). Partial least squares algorithms and methods. Wiley Interdisciplinary Reviews: Computational Statistics, 5( 1), 1– 19. [5] Fornell, C., Barclay, D. W., & Rhee, B.‐D. ( 1988). A model and simple iterative algorithm for redundancy analysis. Multivariate Behavioral Research, 23( 3), 349– 360. [6] Groenink, M., Den Hartog, A. W., Franken, R., Radonic, T., De Waard, V., Timmermans, J., … Mulder, B. J. M. ( 2013). Losartan reduces aortic dilatation rate in adults with Marfan syndrome: A randomized controlled trial. European Heart Journal, 34( 45), 3491– 3500. [7] Huang, S., Chaudhary, K., & Garmire, L. X. ( 2017). More is better: Recent progress in multi‐omics data integration methods. Frontiers in Genetics, 8, 84. [8] Israels, A. Z. ( 1984). Redundancy analysis for qualitative variables. Psychometrika, 49( 3), 331– 346. [9] Johansson, J. K. ( 1981). An extension of Wollenberg’s redundancy analysis. Psychometrika, 46( 1), 93– 103. [10] Karaman, Ä., Nørskov, N. P., Yde, C. C., Hedemann, M. S., Bach Knudsen, K. E., & Kohler, A. ( 2015). Sparse multi‐block PLSR for biomarker discovery when integrating data from LC-MS and NMR metabolomics. Metabolomics, 11( 2), 367– 379. [11] Kawaguchi, A., & Yamashita, F. ( 2017). Supervised multiblock sparse multivariable analysis with application to multimodal brain imaging genetics. Biostatistics, 18( 4), 651– 665. [12] Oksanen, J., Kindt, R., Legendre, P., O’Hara, B., Stevens, M. H. H., Oksanen, M. J., & Suggests, M. ( 2007). The vegan package. Community Ecology Package, 10, 631– 637. [13] Radonic, T., de Witte, P., Groenink, M., de Waard, V., Lutter, R., van Eijk, M., … Zwinderman, A. H. ( 2012). Inflammation aggravates disease severity in Marfan syndrome patients. PLoS ONE, 7( 3), 1– 9. [14] Sanchez, G. ( 2013). PLS path modeling with R. Berkeley: Trowchez Editions. [15] van den Wollenberg, A. L. ( 1977). Redundancy analysis an alternative for canonical correlation analysis. Psychometrika, 42( 2), 207– 219. · Zbl 0354.92050 [16] Vinzi, V. E., Trinchera, L., & Amato, S. ( 2010). PLS path modeling: From foundations to recent developments and open issues for model assessment and improvement. In Handbook of partial least squares (pp. 47– 82). Berlin: Springer. [17] Waaijenborg, S., & Zwinderman, A. H. ( 2009). Sparse canonical correlation analysis for identifying, connecting and completing gene‐expression networks. BMC Bioinformatics, 10( 1), 315. [18] Wilms, I., & Croux, C. ( 2015). Sparse canonical correlation analysis from a predictive point of view. Biometrical Journal, 57( 5), 834– 851. · Zbl 1336.62156 [19] Wold, H. ( 1975). Path models with latent variables: The NIPALS approach. In Quantitative Sociology (pp. 307– 357). New York: Elsevier. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.