×

Regularised MANOVA for high-dimensional data. (English) Zbl 1336.62135

Summary: The traditional and readily available multivariate analysis of variance (MANOVA) tests such as Wilks’ Lambda and the Pillai-Bartlett trace start to suffer from low power as the number of variables approaches the sample size. Moreover, when the number of variables exceeds the number of available observations, these statistics are not available for use. Ridge regularisation of the covariance matrix has been proposed to allow the use of MANOVA in high-dimensional situations and to increase its power when the sample size approaches the number of variables. In this paper two forms of ridge regression are compared to each other and to a novel approach based on lasso regularisation, as well as to more traditional approaches based on principal components and the Moore-Penrose generalised inverse. The performance of the different methods is explored via an extensive simulation study. All the regularised methods perform well; the best method varies across the different scenarios, with margins of victory being relatively modest. We examine a data set of soil compaction profiles at various positions relative to a ridgetop, and illustrate how our results can be used to inform the selection of a regularisation method.

MSC:

62H12 Estimation in multivariate analysis
62J10 Analysis of variance and covariance (ANOVA)

Software:

corpcor; R; corpor; glasso; mvabund
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Anderson, An Introduction to Multivariate Statistical Analysis (2003) · Zbl 1039.62044
[2] Bai, Effect of high dimension: by an example of a two sample problem, Statist. Sinica 6 pp 311– (1996) · Zbl 0848.62030
[3] Edgington, Randomization tests (2007)
[4] Fan, Network exploration via the adaptive LASSO and SCAD penalties, Ann. Appl. Stat. 3 pp 521– (2009) · Zbl 1166.62040
[5] Friedman, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 pp 432– (2008) · Zbl 1143.62076
[6] Friedman, glasso: Graphical Lasso-Estimation of Gaussian Graphical Models, Biostat (2014)
[7] Hastie, The Elements of Statistical Learning (2009) · Zbl 1273.62005
[8] Huang, Covariance matrix selection and estimation via penalised normal likelihood, Biometrika 93 pp 85– (2006) · Zbl 1152.62346
[9] Kong, A multivariate approach for integrating genome-wide expression data and biological knowledge, Bioinformatics 22 pp 2373– (2006)
[10] Ledoit, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, J. Empirical Finan 10 pp 603– (2003)
[11] Ledoit, A well-conditioned estimator for large-dimensional covariance matrices, J. Multivariate Anal. 88 pp 365– (2004) · Zbl 1032.62050
[12] Opgen-Rhein, Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach, Stat. Appl. Genet. Mol. Biol. 6 pp 9– (2007) · Zbl 1166.62361
[13] Schäfer, An empirical Bayes approach to inferring large-scale gene association networks, Bioinformatics 21 pp 754– (2005a)
[14] Schäfer, A shrinkage approach to large-scale covariance Matrix estimation and implications for functional genomics, Stat. Appl. Genet. Mol. Biol. 4 pp 1175– (2005b)
[15] Schäfer, corpcor: Efficient Estimation of Covariance and (Partial) Correlation (2013)
[16] Shen, Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis, Comput. Statist. Data Anal. 55 pp 2221– (2011) · Zbl 1328.62356
[17] Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 pp 267– (1996) · Zbl 0850.62538
[18] Tomfohr, Pathway level analysis of gene expression using singular value decomposition, BMC Bioinformatics 6 pp 1– (2005)
[19] Tsai, Multivariate analysis of variance test for gene set analysis, Bioinformatics 25 pp 897– (2009) · Zbl 05743845
[20] Ullah , I. 2015 Contributions to high dimensional data analysis: some applications of the regularized covariance matrices
[21] Venables, Modern applied statistics with S (2002)
[22] Wang, mvabund-an R package for model-based analysis of multivariate abundance data, Methods Ecol. Evol. 3 pp 471– (2012)
[23] Warton, Penalized normal likelihood and ridge Regularization of correlation and covariance matrices, J. Amer. Statist. Assoc. 103 pp 340– (2008) · Zbl 1471.62362
[24] Yuan, Model selection and estimation in the Gaussian graphical model, Biometrika 94 pp 19– (2007) · Zbl 1142.62408
[25] Zou, The adaptive lasso and its Oracle properties, J. Amer. Statist. Assoc. 101 pp 1418– (2006) · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.