Hierarchical array priors for ANOVA decompositions of cross-classified data. (English) Zbl 1454.62224

Summary: ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays that can adapt to the presence of such interactions. These prior distributions are based on a type of array-variate normal distribution, for which a covariance matrix for each factor is estimated. This prior is able to adapt to potential similarities among the levels of a factor, and incorporate any such information into the estimation of the effects in which the factor appears. In the presence of such similarities, this prior is able to borrow information from well-estimated main effects and lower-order interactions to assist in the estimation of higher-order terms for which data information is limited.


62J10 Analysis of variance and covariance (ANOVA)
62F15 Bayesian inference
62K15 Factorial statistical designs
62-08 Computational methods for problems pertaining to statistics


Full Text: DOI arXiv Euclid


[1] Albrink, M. J. and Ullrich, I. H. (1986). Interaction of dietary sucrose and fiber on serum lipids in healthy young men fed high carbohydrate diets. Am. J. Clin. Nutr. 43 419-428.
[2] Austin, G. L., Ogden, L. G. and Hill, J. O. (2011). Trends in carbohydrate, fat, and protein intakes and association with energy intake in normal-weight, overweight, and obese individuals: 1971-2006. Am. J. Clin. Nutr. 93 836-843.
[3] Basiotis, P. P., Thomas, R. G., Kelsay, J. L. and Mertz, W. (1989). Sources of variation in energy intake by men and women as determined from one year’s daily dietary records. Am. J. Clin. Nutr. 50 448-453.
[4] Beran, R. (2005). ASP fits to multi-way layouts. Ann. Inst. Statist. Math. 57 201-220. · Zbl 1083.62061
[5] Berger, J. O. and Yang, R.-y. (1994). Noninformative priors and Bayesian testing for the \(\operatorname{AR}(1)\) model. Econometric Theory 10 461-482. · Zbl 04521875
[6] Chandalia, M., Garg, A., Lutjohann, D., von Bergmann, K., Grundy, S. M. and Brinkley, L. J. (2000). Beneficial effects of high dietary fiber intake in patients with type 2 diabetes mellitus. N. Engl. J. Med. 342 1392-1398.
[7] Cui, Y., Hodges, J. S., Kong, X. and Carlin, B. P. (2010). Partitioning degrees of freedom in hierarchical and other richly parameterized models. Technometrics 52 124-136.
[8] Dawid, A. P. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika 68 265-274. · Zbl 0464.62039
[9] Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Available at . 1001.0736
[10] Gelman, A. (2005). Analysis of variance-Why it is more important than ever. Ann. Statist. 33 1-53. · Zbl 1064.62082
[11] Gelman, A. and Hill, J. (2007). Data analysis using regression and multilevel hierarchical models. Unpublished manuscript.
[12] Genkin, A., Lewis, D. D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics 49 291-304.
[13] Hodges, J. S., Sargent, D. J., Cui, Y. and Carlin, B. P. (2007). Smoothing balanced single-error-term analysis of variance. Technometrics 49 12-25.
[14] Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Bayesian Anal. 6 179-196. · Zbl 1330.62132
[15] Johansson, G., Wikman, A., Ahren, A. M., Hallmans, G. and Johansson, I. et al. (2001). Underreporting of energy intake in repeated 24-hour recalls related to gender, age, weight status, day of interview, educational level, reported food intake, smoking habits and area of living. Public Health Nutrition 4 919-928.
[16] Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928-934. · Zbl 0851.62020
[17] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455-500. · Zbl 1173.65029
[18] Kruschke, J. (2011). Doing Bayesian Data Analysis : A Tutorial Introduction with R and BUGS . Academic Press, Boston, MA. · Zbl 1301.62001
[19] Miller, R. and Brown, B. (1997). Beyond ANOVA : Basics of Applied Statistics . Chapman & Hall/CRC, New York. · Zbl 0885.62081
[20] Moerman, C., De Mesquita, H. and Runia, S. (1993). Dietary sugar intake in the aetiology of biliary tract cancer. International Journal of Epidemiology 22 207-214.
[21] Montonen, J., Knekt, P., Järvinen, R., Aromaa, A. and Reunanen, A. (2003). Whole-grain and fiber intake and the incidence of type 2 diabetes. Am. J. Clin. Nutr. 77 622-629.
[22] Nielsen, S. J. and Popkin, B. M. (2004). Changes in beverage intake between 1977 and 2001. Am. J. Prev. Med. 27 205-210.
[23] Olson, C. (1976). On choosing a test statistic in multivariate analysis of variance. Psychological Bulletin 83 579.
[24] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681-686. · Zbl 1330.62292
[25] Park, D., Gelman, A. and Bafumi, J. (2006). State level opinions from national surveys: Poststratification using multilevel logistic regression. In Public Opinion in State Politics 209-228. Stanford Univ. Press, Stanford, CA.
[26] Park, Y., Subar, A. F., Hollenbeck, A. and Schatzkin, A. (2011). Dietary fiber intake and mortality in the NIH-AARP diet and health study. Arch. Intern. Med. 171 1061-1068.
[27] Pittau, M., Zelli, R. and Gelman, A. (2010). Economic disparities and life satisfaction in European regions. Social Indicators Research 96 339-361.
[28] USDA. (2010). Food and Nutrient Database for Dietary Studies 4.1. U.S. Dept. Agriculture, Agricultural Research Service, Food Surveys Research Group, Beltsville, MD.
[29] Verly Junior, E., Fisberg, R. M., Cesar, C. L. G. and Marchioni, D. M. L. (2010). Sources of variation of energy and nutrient intake among adolescents in São Paulo. Brazil. Cadernos de Saúde Pública 26 2129-2137.
[30] Volfovsky, A. and Hoff, P. (2013). Supplement to “Hierarchical array priors for ANOVA decompositions of cross-classified data.” . · Zbl 1454.62224
[31] Yang, E. J., Chung, H. K., Kim, W. Y., Kerver, J. M. and Song, W. O. (2003). Carbohydrate intake is associated with diet quality and risk factors for cardiovascular disease in us adults: Nhanes iii. Journal of the American College of Nutrition 22 71-79.
[32] Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215-1225. · Zbl 1117.62453
[33] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[34] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19-35. · Zbl 1142.62408
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.