Model selection rates of information based criteria.

*(English)*Zbl 1283.62083Summary: Model selection criteria proposed over the years have become common procedures in applied research. This article examines the true model selection rates of any model selection criterion; with true model meaning the data generating model. The rate at which model selection criteria select the true model is important because the decision of model selection criteria affects both interpretation and prediction.

This article provides a general functional form for the mean function of the true model selection rates process, for any model selection criterion. Until now, no other article has provided a general form for the mean functions of true model selection rate processes. As an illustration of the general form, this article provides the mean function for the true model selection rates of two commonly used model selection criteria, Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC). The simulations reveal deeper insight into the properties of consistency and efficiency of AIC and BIC. Furthermore, the methodology proposed here for tracking the mean function of model selection procedures, which is based on the accuracy of selection, lends itself for determining sufficient sample size in linear models for reliable inference in model selection.

This article provides a general functional form for the mean function of the true model selection rates process, for any model selection criterion. Until now, no other article has provided a general form for the mean functions of true model selection rate processes. As an illustration of the general form, this article provides the mean function for the true model selection rates of two commonly used model selection criteria, Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC). The simulations reveal deeper insight into the properties of consistency and efficiency of AIC and BIC. Furthermore, the methodology proposed here for tracking the mean function of model selection procedures, which is based on the accuracy of selection, lends itself for determining sufficient sample size in linear models for reliable inference in model selection.

##### MSC:

62G08 | Nonparametric regression and quantile regression |

62J05 | Linear regression; mixed models |

62J12 | Generalized linear models (logistic models) |

65C60 | Computational problems in statistics (MSC2010) |

##### Keywords:

model selection rate; AIC; BIC; discrete process mean function; multiple linear regresion; linear models; generalized linear models
PDF
BibTeX
XML
Cite

\textit{A. Chaurasia} and \textit{O. Harel}, Electron. J. Stat. 7, 2762--2793 (2013; Zbl 1283.62083)

**OpenURL**

##### References:

[1] | Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control 19 6 716-723. · Zbl 0314.62039 |

[2] | Aron, A. and Aron, E. N. (2003). Statistics for Psychology . Prentice Hall/Pearson Education. · Zbl 1013.13011 |

[3] | Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical-Theoretic Approach , 2 ed. Springer, New York. · Zbl 1005.62007 |

[4] | Burnham, K. P. and Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33 261-304. |

[5] | Cavanaugh, J. (1997). Unifying the derivations of the Akaike and corrected Akaike information criteria. Statistics and Probability Letters 33 201-208. · Zbl 1130.62302 |

[6] | Cetin, M. C. and Erar, A. (2002). Variable selection with Akaike information criteria: a comparative study. Hacettepe Journal of Mathematics and Statistics 31 89-97. · Zbl 1029.62060 |

[7] | Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association 98 900-916. · Zbl 1045.62003 |

[8] | Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciencies . Routledge. · Zbl 0747.62110 |

[9] | Cohen, J. (1992). A power primer. Psychological Bulletin 112 155. |

[10] | Cohen, J. and Cohen, P. (1975). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences . Lawrence Erlbaum. |

[11] | CramĂ©r, H. (1946). A contribution to the theory of statistical estimation. Scandinavian Actuarial Journal 1946 85-94. · Zbl 0060.30513 |

[12] | Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research 26 499-510. |

[13] | Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society. Series B (Methodological) 41 190-195. · Zbl 0408.62076 |

[14] | Hurvich, C. M. and Tsai, C. L. (1989). Regression and time series model selection in small samples. Biometrika 76 297-307. · Zbl 0669.62085 |

[15] | Hurvich, C. M. and Tsai, C. L. (1990). The impact of model selection on inference in linear regression. The American Statistician 44 214-217. |

[16] | Hurvich, C. M. and Tsai, C.-L. (1995). Relative rate of convergence for efficinet model selection criteria in linear regression. Biometrika 82 418-425. · Zbl 0828.62034 |

[17] | Kadane, J. B. and Lazar, N. A. (2004). Methods and criteria for model selection. Journal of the American Statistical Association 99 279-290. · Zbl 1089.62501 |

[18] | Kass, R. E. and Raftery, A. E. (1995). Bayes factor. Journal of the American Statistical Association 90 773-795. · Zbl 0846.62028 |

[19] | Kelley, K. and Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods 8 305. |

[20] | Neath, A. A. and Cavanaugh, J. E. (1997). Regression and time series model selection using variants of the Schwarz information criterion. Communications in Statistics 26 559-580. · Zbl 1030.62532 |

[21] | Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12 758-765. · Zbl 0544.62063 |

[22] | Rao, C. R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bulletin of Cal. Math. Soc. 37 81-91. · Zbl 0063.06420 |

[23] | Rao, C. R. and Wu, Y. (2001). Model selection. Lecture Notes-Monograph Series 38 1-64. |

[24] | Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics 6 461-464. · Zbl 0379.62005 |

[25] | Shi, P. and Tsai, C.-L. (2002). Regression model selection-a residual likelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 237-252. · Zbl 1059.62074 |

[26] | Shibata, R. (1981). An optimal selection of regression variables. Biometrika 68 45-54. · Zbl 0464.62054 |

[27] | Spiegelhalter, D. J., Best, N., Carlin, B. P. and Van der Linde, A. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models Technical Report, Research Report, 98-009. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.