An introduction to model selection. (English) Zbl 0949.62092

Summary: This paper is an introduction to model selection intended for nonspecialists who have knowledge of the statistical concepts covered in a typical first (occasionally second) statistics course. The intention is to explain the ideas that generate frequentist methodology for model selection, for example the Akaike information criterion, bootstrap criteria, and cross-validation criteria. Bayesian methods, including the Bayesian information criterion, are also mentioned in the context of the framework outlined in the paper. The ideas are illustrated using an example in which observations are available for the entire population of interest. This enables us to examine and to measure effects that are usually invisible, because in practical applications only a sample from the population is observed. The problem of selection bias, a hazard of which one needs to be aware in the context of model selection, is also discussed.


62P15 Applications of statistics to psychology
62B10 Statistical aspects of information-theoretic topics
91C99 Social and behavioral sciences: general topics
Full Text: DOI


[1] Akaike, H., Information theory and an extension of the maximum likelihood principle, (Petrov, B. N.; Csaki, F., Second international symposium on information theory (1973), Academiai Kiado: Academiai Kiado Budapest), 267-281 · Zbl 0283.62006
[2] Albert, J.; Chib, S., Bayesian tests and model diagnostics in conditionally independent hierarchical models, Journal of the American Statistical Association, 92, 916-925 (1997) · Zbl 0889.62016
[3] Browne, M. W., Cross-validation methods, Journal of Mathematical Psychology, 44, 108-132 (2000) · Zbl 0946.62045
[4] Buckland, S. T.; Burnham, K. P.; Augustin, N. H., Model selection: an integral part of inference, Biometrics, 53, 603-618 (1997) · Zbl 0885.62118
[5] Chatfield, C., Model uncertainty, data mining and statistical inference, Journal of the Royal Statistical Society, Series A, 158, 419-466 (1995)
[6] Chung, H.-Y; Lee, K.-W; Koo, J-Y, A note on bootstrap model selection criterion, Statistic & Probability Letters, 26, 35-41 (1996) · Zbl 0843.62050
[7] DiCiccio, T. J.; Kass, R. E.; Raftery, A.; Wasserman, L., Computing Bayes factors by combining simulation and asymptotic approximations, Journal of the American Statistical Association, 92, 903-915 (1997) · Zbl 1050.62520
[8] Efron, B., Bootstrap methods: Another look at the jackknife, The Annals of Statistics, 7, 1-26 (1979) · Zbl 0406.62024
[9] Heller, G. Z., Who visits the GP? Demographic patterns in a Sydney suburb, Technical report (1997)
[10] Hurvich, C. M.; Tsai, C., Regression and time series model selection in small samples, Biometrika, 76, 297-307 (1989) · Zbl 0669.62085
[11] Kass, R. E.; Raftery, A. E., Bayes factors, Journal of the American Statistical Association, 90, 773-795 (1995) · Zbl 0846.62028
[12] Linhart, H.; Zucchini, W., Model selection (1986), Wiley: Wiley New York · Zbl 0665.62003
[13] Miller, A. J., Subset selection in regression (1990), Chapman and Hall: Chapman and Hall London · Zbl 0702.62057
[14] Myung, I. J., The importance of complexity in model selection, Journal of Mathematical Psychology, 44, 190-204 (2000) · Zbl 0946.62094
[15] Parr, W. C., Minimum distance estimation: a bibliography, Communication in Statistics—Theory and Methods, A10, 1205-1224 (1981) · Zbl 0458.62035
[16] Raftery, A. E., Bayesian model selection in social research, (Marsden, P. V., Sociological methodology 1995 (1995), Blackwells: Blackwells Oxford), 111-196
[17] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 461-464 (1978) · Zbl 0379.62005
[18] Stone, M., Cross-validation and multinomial prediction, Biometrika, 61, 509-515 (1974) · Zbl 0292.62025
[19] Wasserman, L., Bayesian model selection and model averaging, Journal of Mathematical Psychology, 44, 92-107 (2000) · Zbl 0946.62032
[20] Ye, J., On measuring and correcting the effects of data mining and model selection, Journal of the American Statistical Association, 93, 120-131 (1998) · Zbl 0920.62056
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.