×

Focused model selection for linear mixed models with an application to whale ecology. (English) Zbl 1446.62296

Summary: A central point of disagreement, in certain long-standing discussions about a particular whaling dataset in the Scientific Committee of the International Whaling Commission, has directly involved model selection issues for linear mixed effect models. The biological question under discussion is associated with a clearly defined parameter of primary interest, a focus parameter, which makes model selection with the Focused Information Criterion (FIC) more appropriate than other selection methods. Since the existing FIC methodology has not covered the case of linear mixed effects models, this article sets up the required framework and develops the necessary formulae for the relevant FIC. Our new criterion requires the asymptotic distribution of estimators derived for a given candidate linear mixed model but with behaviour examined under a wider linear mixed model. These results, needed here to build our FIC, also have independent interest.

MSC:

62P12 Applications of statistics to environmental and related topics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62B10 Statistical aspects of information-theoretic topics
62J05 Linear regression; mixed models

Software:

MEMSS; lme4; S-PLUS
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Bates, D., Maechler, M., Bolker, B. and Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R Package Version 1.1-23.
[2] Behl, P., Dette, H., Frondel, M. and Tauchmann, H. (2012). Choice is suffering: A focused information criterion for model selection. Econ. Model. 29 817-822.
[3] Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802-837. · Zbl 1267.62080 · doi:10.1214/12-AOS1077
[4] Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H. and White, J.-S. S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends Ecol. Evol. 24 127-135.
[5] Bondell, H. D., Krishna, A. and Ghosh, S. K. (2010). Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66 1069-1077. · Zbl 1233.62134 · doi:10.1111/j.1541-0420.2010.01391.x
[6] Brownlees, C. T. and Gallo, G. M. (2008). On variable selection for volatility forecasting: The role of focused selection criteria. J. Financ. Econom. 6 513-539.
[7] Charkhi, A. and Claeskens, G. (2018). Asymptotic post-selection inference for the Akaike information criterion. Biometrika 105 645-664. · Zbl 1499.62099 · doi:10.1093/biomet/asy018
[8] Chen, Z. and Dunson, D. B. (2003). Random effects selection in linear mixed models. Biometrics 59 762-769. · Zbl 1214.62027 · doi:10.1111/j.0006-341X.2003.00089.x
[9] Claeskens, G., Croux, C. and Van Kerckhoven, J. (2007). Prediction-focused model selection for autoregressive models. Aust. N. Z. J. Stat. 49 359-379. · Zbl 1521.62144 · doi:10.1111/j.1467-842X.2007.00487.x
[10] Claeskens, G., Cunen, C. and Hjort, N. L. (2019). Model selection via focused information criteria for complex data in ecology and evolution. Front. Ecol. Evol. 7 1-13.
[11] Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. J. Amer. Statist. Assoc. 98 900-945. · Zbl 1045.62003 · doi:10.1198/016214503000000819
[12] Claeskens, G. and Hjort, N. L. (2008a). Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics 27. Cambridge Univ. Press, Cambridge. · Zbl 1166.62001
[13] Claeskens, G. and Hjort, N. L. (2008b). Minimizing average risk in regression models. Econometric Theory 24 493-527. · Zbl 1284.62454 · doi:10.1017/S0266466608080201
[14] Craiu, R. V. and Duchesne, T. (2018). A scalable and efficient covariate selection criterion for mixed effects regression models with unknown random effects structure. Comput. Statist. Data Anal. 117 154-161. · Zbl 1469.62047 · doi:10.1016/j.csda.2017.07.011
[15] Cunen, C., Walløe, L. and Hjort, N. L. (2017). Decline in energy storage in Antarctic minke whales during the JARPA period: assessment via the focused information criterion (FIC). IWC/SC/67A/EM04.
[16] Cunen, C., Walløe, L., Konishi, K. and Hjort, N. L. (2020). Decline in energy storage in the Antarctic minke whale (Balaenoptera bonaerensis) in the Southern Ocean during the 1990s. Submitted for publication.
[17] Demidenko, E. (2013). Mixed Models: Theory and Applications with R, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1276.62049
[18] de la Mare, W., McKinlay, J. and Welsh, A. (2017). Analyses of the JARPA Antarctic minke whale fat weight data set. IWC/SC/67A/EM01.
[19] Gandy, A. and Hjort, N. L. (2013). Focused information criteria for semiparametric linear hazard regression. Technical report, Dept. Mathematics, Univ. Oslo.
[20] Government of Japan (1987). The program for research on the Southern Hemisphere minke whale and for preliminary research on the marine ecosystem in the Antarctic. IWC/SC/39/04 [available from the IWC Secretariat].
[21] Greenland, S., Pearl, J., Robins, J. M. (1999). Causal diagrams for epidemiologic research. Epidemiology 10 37-48.
[22] Grueber, C. E., Nakagawa, S., Laws, R. J. and Jamieson, I. G. (2011). Multimodel inference in ecology and evolution: Challenges and solutions. J. Evol. Biol. 24 699-711.
[23] Gumedze, F. N. and Dunne, T. T. (2011). Parameter estimation and inference in the linear mixed model. Linear Algebra Appl. 435 1920-1944. · Zbl 1217.62073 · doi:10.1016/j.laa.2011.04.015
[24] Heagerty, P. J. and Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 88 973-985. · Zbl 0986.62060 · doi:10.1093/biomet/88.4.973
[25] Hermansen, G. H., Hjort, N. L. and Kjesbu, O. S. (2016). Modern statistical methods applied on extensive historic data: Hjort liver quality time series 1859-2012 and associated influential factors. Can. J. Fish. Aquat. Sci. 73 279-295.
[26] Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879-899. · Zbl 1047.62003 · doi:10.1198/016214503000000828
[27] Hjort, N. L. and Claeskens, G. (2006). Focused information criteria and model averaging for the Cox hazard regression model. J. Amer. Statist. Assoc. 101 1449-1464. · Zbl 1171.62350 · doi:10.1198/016214506000000069
[28] Hui, F. K. C., Müller, S. and Welsh, A. H. (2017). Joint selection in mixed models using regularized PQL. J. Amer. Statist. Assoc. 112 1323-1333.
[29] Ibrahim, J. G., Zhu, H., Garcia, R. I. and Guo, R. (2011). Fixed and random effects selection in mixed effects models. Biometrics 67 495-503. · Zbl 1217.62171 · doi:10.1111/j.1541-0420.2010.01463.x
[30] Jiang, J., Rao, J. S., Gu, Z. and Nguyen, T. (2008). Fence methods for mixed model selection. Ann. Statist. 36 1669-1692. · Zbl 1142.62047 · doi:10.1214/07-AOS517
[31] Jullum, M. and Hjort, N. L. (2017). Parametric or nonparametric: The FIC approach. Statist. Sinica 27 951-981. · Zbl 1370.62012
[32] Jullum, M. and Hjort, N. L. (2019). What price semiparametric Cox regression? Lifetime Data Anal. 25 406-438. · Zbl 1429.62436 · doi:10.1007/s10985-018-9450-7
[33] Konishi, K. and Walløe, L. (2015). Substantial decline in energy storage and stomach fullness in Antarctic minke whales during the 1990s. J. Cetacean Res. Manag. 15 77-92.
[34] Konishi, K., Tamura, T., Zenitani, R., Bando, T., Kato, H. and Walløe, L. (2008). Decline in energy storage in the Antarctic minke whale (Balaenoptera bonaerensis) in the Southern Ocean. Polar Biol. 31 1509-1520.
[35] Laws, R. M. (1977). Seals and whales of the Southern Ocean. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 279 81-96.
[36] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Ann. Statist. 7 381-394. · Zbl 0414.62040 · doi:10.1214/aos/1176344621
[37] Magnus, J. R. and Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, Chichester. · Zbl 0651.15001
[38] McKinlay, J., de la Mare, W. and Welsh, A. (2017). A re-examination of minke whale body condition as reflected in the data. IWC/SC/67A/EM02.
[39] Mori, M. and Butterworth, D. S. (2006). A first step towards modelling the krill-predator dynamics of the Antarctic ecosystem. CCAMLR Sci. 13 217-277.
[40] Müller, S., Scealy, J. L. and Welsh, A. H. (2013). Model selection in linear mixed models. Statist. Sci. 28 135-167. · Zbl 1331.62364
[41] Peng, H. and Lu, Y. (2012). Model selection in linear mixed effect models. J. Multivariate Anal. 109 109-129. · Zbl 1241.62105 · doi:10.1016/j.jmva.2012.02.005
[42] Pinheiro, J. and Bates, D. (2000). Mixed-Effects Models in S and S-PLUS. Springer, Berlin. · Zbl 0953.62065
[43] Schweder, T. and Hjort, N. L. (2016). Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions. Cambridge Series in Statistical and Probabilistic Mathematics 41. Cambridge Univ. Press, New York. · Zbl 1353.62007
[44] Tibshirani, R. J., Taylor, J., Lockhart, R. and Tibshirani, R. (2016). Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111 600-620.
[45] Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrika 92 351-370. · Zbl 1094.62077 · doi:10.1093/biomet/92.2.351
[46] Ver Hoef, J. M. and Boveng, P. L. (2015). Iterating on a single model is a viable alternative to multimodel inference. J. Wildl. Manag. 79 719-729.
[47] Verbeke, G. and Lesaffre, E. (1997). The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput. Statist. Data Anal. 23 541-556. · Zbl 0900.62374 · doi:10.1016/S0167-9473(96)00047-3
[48] Verbeke, G., Spiessens, B. and Lesaffre, E. (2001). Conditional linear mixed models. Amer. Statist. 55 25-34.
[49] Zhang, X. · Zbl 1209.62088 · doi:10.1214/10-AOS832
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.