×

Estimating a difference of Kullback-Leibler risks using a normalized difference of AIC. (English) Zbl 1149.62087

Summary: AIC is commonly used for model selection but the precise value of AIC has no direct interpretation. We are interested in quantifying a difference of risks between two models. This may be useful for both an explanatory point of view or for prediction, where a simpler model may be preferred if it does nearly as well as a more complex model. The difference of risks can be interpreted by linking the risks with relative errors in the computation of probabilities and looking at the values obtained for simple models.
A scale of values going from negligible to large is proposed. We propose a normalization of a difference of Akaike criteria for estimating the difference of expected Kullback-Leibler risks between maximum likelihood estimators of the distribution in two different models. The variability of this statistic can be estimated. Thus, an interval can be constructed which contains the true difference of expected Kullback-Leibler risks with a pre-specified probability. A simulation study shows that the method works and it is illustrated on two examples. The first is a study of the relationship between body-mass index and depression in elderly people. The second is the choice between models of HIV dynamics, where one model makes the distinction between activated CD4+ T lymphocytes and the other does not.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62N02 Estimation in survival analysis and censored data
65C60 Computational problems in statistics (MSC2010)

References:

[1] Akaike, H. (1973). Information theory and an extension of maximum likelihood principle. In Second International Symposium on Information Theory ( Tsahkadsor , 1971) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006
[2] Bergdahl, E., Allard, P., Lundman, B. and Gustafson, Y. (2007). Depression in the oldest old in urban and rural municipalities. Aging and Mental Health 5 570-578.
[3] Bjerkeset, O., Romundstad, P., Evans, J. and Gunnell, D. (2008). Association of adult body mass index and height with anxiety, depression, and suicide in the general population: The HUNT Study. Am. J. Epidemiol. 167 193-202.
[4] Bortz, D. M. and Nelson, P. W. (2006). Model selection and mixed-effects modeling of HIV infection dynamics. Bull. Math. Biol. 68 2005-2025. · Zbl 1296.92123 · doi:10.1007/s11538-006-9084-x
[5] Bozdogan, H. (2000). Akaike’s information criterion and recent developments in information complexity. J. Math. Psych. 44 62-91. · Zbl 1047.62501 · doi:10.1006/jmps.1999.1277
[6] Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multi-Model Inference : A Practical Information-Theoretic Approach . Springer, New York. · Zbl 1005.62007 · doi:10.1007/b97636
[7] Commenges, D., Joly, P, Gégout-Petit, A. and Liquet, B. (2007). Choice between semi-parametric estimators of Markov and non-Markov multi-state models from generally coarsened observations. Scand. J. Statistics 34 33-52. · Zbl 1142.62054 · doi:10.1111/j.1467-9469.2006.00536.x
[8] De Boer, R. and Perelson, A. S. (1998). Target cell limited and immune control models of HIV infection: A comparison. J. Theor. Biol. 190 201-214.
[9] deLeuwe, J. (1992). Introduction to Akaike (1973) information theory and an extension of the maximum likelihood principle. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 599-609. Springer, New York.
[10] Evans, M., Hastings, N. and Peacock, B. (1993). Statistical Distributions , 2nd ed. Wiley, New York. · Zbl 0834.62001
[11] Guedj, J., Thiébaut, R. and Commenges, D. (2007). Maximum likelihood estimation in dynamical models of HIV. Biometrics 63 1198-1206. · Zbl 1136.62074 · doi:10.1111/j.1541-0420.2007.00812.x
[12] Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879-899. · Zbl 1047.62003 · doi:10.1198/016214503000000828
[13] Ho, D. D., Neumann, A. U., Perelson, A. S., Chen, W., Leonard, J. M. and Markowitz, M. (1995). Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373 123-126.
[14] Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 332-417. · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[15] Kendall, M. G. and Stuart A. (1973). The Advanced Theory of Statistics . Griffin, London. · Zbl 0249.62003
[16] Konishi, S. and Kitagawa, G. (1996). Generalised information criteria in model selection. Biometrika 83 875-890. · Zbl 0883.62004 · doi:10.1093/biomet/83.4.875
[17] Kullback, S. (1968). Information Theory and Statistics . Dover, New York. · Zbl 0274.62036
[18] Letenneur, L., Gilleron, V., Commenges, D., Helmer, C., Orgogozo, J. M. and Dartigues, J. F. (1999). Are sex and educational level independent predictors of dementia and Alzheimer’s disease? Incidence data from the PAQUID project. J. Neurology Neurosurgery and Psychiatry 66 177-183.
[19] Linhart, H. and Zucchini, W. (1986). Model Selection . Wiley, New York. · Zbl 0665.62003
[20] Liquet, B., Sakarovitch, C. and Commenges, D. (2003). Bootstrap choice of estimators in parametric and semi-parametric families: An extension of EIC. Biometrics 59 172-178. · Zbl 1210.62033 · doi:10.1111/1541-0420.00020
[21] Molina, J., Chêne, G., Ferchal, F., Journot, V., Pellegrin, I., Sombardier, M. N., Rancinan, C., Cotte, L., Madelaine, I., Debord, T. and Decazes, J. M. (1999). The ALBI Trial: A randomized controlled trial comparing stavudine plus didanosine with zidovudine plus lamivudine and a regimen alternating both combinations in previously untreated patients infected with human immunodeficiency virus. J. Infectious Diseases 180 351-358.
[22] Perelson, A. S., Neuman, A. U., Markowitch, M., Leonard, J. M. and Ho, D. D. (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271 1582-1586. · Zbl 1225.92058 · doi:10.1126/science.275.5298.334
[23] Putter, H., Heisterkamp, S. H., Lange, J. M. A. and de Wolf, F. (2002). A Bayesian approach to parameter estimation in HIV dynamic models. Stat. Med. 21 2199-2214.
[24] Shen, X. and Huang, H.-C. (2006). Optimal model assessment, selection and combination. J. Am. Statist. Assoc. 101 554-568. · Zbl 1119.62306 · doi:10.1198/016214505000001078
[25] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[26] Shibata, R. (1997). Bootstrap estimate of Kullback-Leibler information for model selection. Statist. Sinica 7 375-394. · Zbl 0926.62031
[27] Shimodaira, H. (2001). Multiple comparisons of log-likelihoods and combining nonnested models with applications to phylogenetic tree selection. Comm. Statist. Theory Methods 30 1751, 1772. · Zbl 1008.62547 · doi:10.1081/STA-100105696
[28] Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica 57 307-333. · Zbl 0701.62106 · doi:10.2307/1912557
[29] Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc. 54 426-482. · Zbl 0063.08120 · doi:10.2307/1990256
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.