×

Lasso ANOVA decompositions for matrix and tensor data. (English) Zbl 1507.62067

Summary: Consider the problem of estimating the entries of an unknown mean matrix or tensor given a single noisy realization. In the matrix case, this problem can be addressed by decomposing the mean matrix into a component that is additive in the rows and columns, i.e. the additive ANOVA decomposition of the mean matrix, plus a matrix of elementwise effects, and assuming that the elementwise effects may be sparse. Accordingly, the mean matrix can be estimated by solving a penalized regression problem, applying a lasso penalty to the elementwise effects. Although solving this penalized regression problem is straightforward, specifying appropriate values of the penalty parameters is not. Leveraging the posterior mode interpretation of the penalized regression problem, moment-based empirical Bayes estimators of the penalty parameters can be defined. Estimation of the mean matrix using these moment-based empirical Bayes estimators can be called LANOVA penalization, and the corresponding estimate of the mean matrix can be called the LANOVA estimate. The empirical Bayes estimators are shown to be consistent. Additionally, LANOVA penalization is extended to accommodate sparsity of row and column effects and to estimate an unknown mean tensor. The behavior of the LANOVA estimate is examined under misspecification of the distribution of the elementwise effects, and LANOVA penalization is applied to several datasets, including a matrix of microarray data, a three-way tensor of fMRI data and a three-way tensor of wheat infection data.

MSC:

62-08 Computational methods for problems pertaining to statistics
62J10 Analysis of variance and covariance (ANOVA)

Software:

denoiseR
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bleeker, F. E.; Molenaar, R. J.; Leenstra, S., Recent advances in the molecular understanding of glioblastoma, J. Neuro-Oncology, 108, 1, 11-27 (2012)
[2] Bredel, M.; Bredel, C.; Juric, D.; Harsh, G. R.; Vogel, H.; Recht, L. D.; Sikic, B. I., Functional network analysis reveals extended gliomagenesis pathway maps and three novel MYC-interacting genes in human gliomas, Cancer Res., 65, 19, 8679-8689 (2005)
[3] Cosset, É.; Ilmjärv, S.; Dutoit, V.; Elliott, K.; von Schalscha, T.; Camargo, M. F.; Reiss, A.; Moroishi, T.; Seguin, L.; Gomez, G.; Moo, J. S.; Preynat-Seauve, O.; Krause, K. H.; Chneiweiss, H.; Sarkaria, J. N.; Guan, K. L.; Dietrich, P. Y.; Weis, S. M.; Mischel, P. S.; Cheresh, D. A., Glut3 addiction is a druggable vulnerability for a molecularly defined subpopulation of glioblastoma, Cancer Cell, 32, 856-868 (2017)
[4] Díaz-Francés, E.; Montoya, J. A., Correction to on the linear combination of normal and laplace random variables, by nadarajah, S., computational statistics, 2006, 21, 6371, Comput. Statist., 23, 4, 661-666 (2008)
[5] Donoho, D. L.; Johnstone, I. M., Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81, July, 425-455 (1994) · Zbl 0815.62019
[6] Donoho, D. L.; Johnstone, I. M., Adapting to unknown smoothness via wavelet shrinkage, J. Amer. Statist. Assoc., 90, 432, 1200-1224 (1995) · Zbl 0869.62024
[7] Ducray, F.; Idbaih, A.; de Reyniès, A.; Bièche, I.; Thillet, J.; Mokhtari, K.; Lair, S.; Marie, Y.; Paris, S.; Vidaud, M.; Hoang-Xuan, K.; Delattre, O.; Delattre, J. Y.; Sanson, M., Anaplastic oligodendrogliomas with 1p19q codeletion have a proneural gene expression profile, Mol. Cancer, 7, 1-17 (2008)
[8] van Eeuwijk, F. A.; Kroonenberg, P. M., Multiplicative models for interaction in three-way ANOVA, with applications to plant breeding, Biometrics, 54, 4, 1315-1333 (1998) · Zbl 1058.62664
[9] Figueiredo, M. A.T., Adaptive sparseness for supervised learning, IEEE Trans. Pattern Anal. Mach. Intell., 25, 9, 1150-1159 (2003)
[10] Fisher, R. A.; Mackenzie, W. A., Studies in crop variation: Ii. the manurial response of different potato varieties, J. Agricultural Sci., 13, 3, 311-320 (1923)
[11] Forkman, J.; Piepho, H.-P., Parametric bootstrap methods for testing multiplicative terms in GGE and AMMI models, Biometrics, 70, 3, 639-647 (2014) · Zbl 1299.65014
[12] Gagnon-Bartsch, J. A.; Jacob, L.; Speed, T. P., Removing unwanted variation from high dimensional data with negative controls, (Technical Reports from the Department of Statistics, University of California, Berkeley (820) (2013)), 1-112
[13] Gerard, D.; Hoff, P., Adaptive higher-order spectral estimators, Electron. J. Stat., 11, 2, 3703-3737 (2017) · Zbl 1373.62251
[14] Ghosh, D.; Ulasov, I. V.; Chen, L.; Harkins, L. E.; Wallenborg, K.; Hothi, P.; Rostad, S.; Hood, L.; Cobbs, C. S., TGFb-Responsive HMOX1 expression is associated with stemness and invasion in glioblastoma multiforme, Stem Cells, 34, 9, 2276-2289 (2016)
[15] Gollob, H. F., A statistical model which combines features of factor analytic and analysis of variance techniques, Psychometrika, 33, 1, 73-115 (1968) · Zbl 0167.48601
[16] Goodman, L. A.; Haberman, S. J., The analysis of nonadditivity in two-way analysis of variance, J. Amer. Statist. Assoc., 85, 409, 139-145 (1990) · Zbl 0702.62064
[17] Griffin, M., Hoff, P.D., 2017. Testing Sparsity Inducing Penalties, arXiv:1712.06230; Griffin, M., Hoff, P.D., 2017. Testing Sparsity Inducing Penalties, arXiv:1712.06230
[18] Johnson, D. E.; Graybill, F. A., An analysis of a two-way model with interaction and no replication, J. Amer. Statist. Assoc., 67, 340, 862-868 (1972) · Zbl 0254.62042
[19] Josse, J.; Sardy, S.; Wager, S., denoiseR: A Package for Low Rank Matrix Estimation (2016), R package version 1.0
[20] Leek, J. T.; Storey, J. D., Capturing heterogeneity in gene expression studies by surrogate variable analysis, PLoS Genet., 3, 9, 1724-1735 (2007)
[21] Mandel, J., A new analysis of variance model for non-additive data, Technometrics, 13, 1, 1-18 (1971) · Zbl 0216.48104
[22] Mitchell, T. M.; Hutchinson, R.; Niculescu, R. S.; Wang, X., Learning to decode cognitive states from brain images, Mach. Learn., 57, January, 145-175 (2004) · Zbl 1078.68715
[23] Nadarajah, S., On the linear combination of normal and laplace random variables, Comput. Statist., 21, 1, 63-71 (2006) · Zbl 1117.62012
[24] Park, T.; Casella, G., The bayesian lasso, J. Amer. Statist. Assoc., 103, 482, 681-686 (2008) · Zbl 1330.62292
[25] Pratt, J. W.; Raiffa, H.; Schlaifer, R., Introduction to Statistical Decision Theory (1965), McGraw-Hill: McGraw-Hill New York
[26] Rousseeuw, P. J.; Leroy, A. M., Robust Regression and Outlier Detection, 287 (1987), John Wiley and Sons: John Wiley and Sons New York · Zbl 0711.62030
[27] She, Y.; Owen, A. B., Outlier detection using nonconvex penalized regression, J. Amer. Statist. Assoc., 106, 494, 626-639 (2011) · Zbl 1232.62068
[28] de Tayrac, M.; Lê, S.; Aubry, M.; Mosser, J.; Husson, F., Simultaneous analysis of distinct omics data sets with integration of biological knowledge: multiple factor analysis approach, BMC Genomics, 10, 1, 32 (2009)
[29] Thirion, B.; Varoquaux, G.; Dohmatob, E.; Poline, J. B., Which fMRI clustering gives good brain parcellations?, Front. Neurosci., 8, 8 JUL, 1-13 (2014)
[30] Tiao, G. C.; Box, G. E.P., Some comments on “Bayes” estimators, Amer. Statist., 27, 1, 12-14 (1973)
[31] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B, 58, 1, 267-288 (1996) · Zbl 0850.62538
[32] Zhang, C.; Li, J.; Wang, H.; Wei Song, S., Identification of a five B cell-associated gene prognostic and predictive signature for advanced glioma patients harboring immunosuppressive subtype preference, Oncotarget, 7, 45, 73971-73983 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.