Symbolic formulae for linear mixed models. (English) Zbl 1445.62177

Nguyen, Hien (ed.), Statistics and data science. Proceedings of the research school on statistics and data science, RSSDS 2019, Melbourne, Australia, July 24–26, 2019. Singapore: Springer. Commun. Comput. Inf. Sci. 1150, 3-21 (2019).
Summary: A statistical model is a mathematical representation of an often simplified or idealised data-generating process. In this paper, we focus on a particular type of statistical model, called linear mixed models (LMMs), that is widely used in many disciplines e.g. agriculture, ecology, econometrics, psychology. Mixed models, also commonly known as multi-level, nested, hierarchical or panel data models, incorporate a combination of fixed and random effects, with LMMs being a special case. The inclusion of random effects in particular gives LMMs considerable flexibility in accounting for many types of complex correlated structures often found in data. This flexibility, however, has given rise to a number of ways by which an end-user can specify the precise form of the LMM that they wish to fit in statistical software. In this paper, we review the software design for specification of the LMM (and its special case, the linear model), focusing in particular on the use of high-level symbolic model formulae and two popular but contrasting R-packages in lme4 and asreml.
For the entire collection see [Zbl 1433.68029].


62J05 Linear regression; mixed models
62A01 Foundations and philosophical topics in statistics
Full Text: DOI arXiv


[1] Aitkin, M., Dorothy, A., Francis, B., Hinde, J.: Statistical Modelling in GLIM. Oxford University Press, Oxford (1989) · Zbl 0676.62001
[2] Bates, D., Machler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1) (2015). https://doi.org/10.18637/jss.v067.i01
[3] Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108-122 (2013)
[4] Butler, D.G., Cullis, B.R., Gilmour, A.R., Gogel, B.J.: Mixed models for s language environments ASReml-R reference manual (2009)
[5] Butler, D.G., Gogel, B.J., Cullis, B.R., Thompson, R.: Navigating from ASReml-R version 3 to 4 (2018)
[6] Bürkner, P.-C.: brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1-28 (2017)
[7] Bürkner, P.-C.: Advanced Bayesian multilevel modeling with the R package brms. R J. 10(1), 395-411 (2018). https://doi.org/10.32614/RJ-2018-017
[8] CAIGE: Caige project (2016). http://www.caigeproject.org.au
[9] Chambers, J.M., Hastie, T.: Statistical models in S. Wadsworth & Brooks/Cole Computer Science Series. Wadsworth & Brooks/Cole Advanced Books & Software (1992). ISBN 9780534167646. http://books.google.fr/books?id=uyfvAAAAMAAJ
[10] Crowder, M., Hand, D.: Analysis of Repeated Measures. Chapman and Hall, London (1990). http://www.python.org · Zbl 0745.62064
[11] Csárdi, G.: cranlogs: download logs from the ‘RStudio’ ‘CRAN’ mirror (2019). https://CRAN.R-project.org/package=cranlogs. R package version 2.1.1
[12] Cullis, B.R., Smith, A.B., Coombes, N.E.: On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4), 381-393 (2006). https://doi.org/10.1198/108571106X154443. ISSN 1085-7117
[13] Gilmour, A.R., Cullis, B.R., Verbyla, A.P.: Accounting for natural and extraneous variation in the analysis of field experiments. J. Agric. Biol. Environ. Stat. 2(3), 269-293 (1997). https://doi.org/10.2307/1400446
[14] Gilmour, A.R., Gogel, B.J., Cullis, B.R., Thompson, R.: ASReml user guide release 3.0 (2009)
[15] Kuhn, M.: parsnip: a common API to modeling and analysis functions (2018). https://topepo.github.io/parsnip. R package version
[16] Mrode, R.A.: Linear Models for the Prediction of Animal Breeding Values, 3rd edn. CABI, Wallingford (2014). https://doi.org/10.1017/CBO9781107415324.004. ISBN 1780643918, 9781780643915
[17] Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825-2830 (2011) · Zbl 1280.68189
[18] Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., R Core Team: Nlme: linear and nonlinear mixed effects models (2019). https://CRAN.R-project.org/package=nlme. R package version 3.1-140
[19] R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project.org/
[20] Ryan, T.A., Joiner, B.L., Ryan, B.F.: The Minitab Student Handbook. Duxbury Press, London (1976)
[21] Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In: 9th Python in Science Conference (2010)
[22] Smith, N.J., et al.: pydata/patsy: v0.5.1, October 2018. https://doi.org/10.5281/zenodo.1472929
[23] Stan Development Team: RStan: the R interface to Stan (2019). http://mc-stan.org/. R package version 2.19.2
[24] Van Rossum, G., Drake Jr, F.L.: Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands (1995). http://www.python.org
[25] Vazquez, A.I., Bates, D.M., Rosa, G.J.M., Gianola, D., Weigel, K.A.: Technical note: an R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci. 88, 497-504 (2010)
[26] VSN International: Genstat for Windows 19th Edition. VSN International, Hemel Hempstead, UK (2017). Genstat.co.uk
[27] Welham, S.J., Gezan, S.A., Clark, S.J., Mead, A.: Statistical Methods in Biology: Design and Analysis of Experiments and Regression. Chapman and Hall, London (2015)
[28] Wickham, H., FranÃğois, R., Henry, L., MÃijller, K.: dplyr: a grammar of data manipulation (2019). https://CRAN.R-project.org/package=dplyr. R package version 0.8.3
[29] Wilkinson, G.N., Rogers, C.E.: Symbolic description of factorial models for analysis of variance. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 22(3), 392-399 (1973)
[30] Wright, K.: agridat: agricultural datasets (2018). https://CRAN.R-project.org/package=agridat. R package version 1.16
[31] Xie, Y.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.