Fence methods for mixed model selection.

*(English)*Zbl 1142.62047Summary: Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied, but these do not naturally extend to nonstandard situations such as mixed effects models, where a simple definition of the sample size is not meaningful. This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible.

In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for good model selection procedures. The methods are illustrated through simulation studies and real data analysis.

In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for good model selection procedures. The methods are illustrated through simulation studies and real data analysis.

##### MSC:

62J12 | Generalized linear models (logistic models) |

65C60 | Computational problems in statistics (MSC2010) |

62F99 | Parametric inference |

62F07 | Statistical ranking and selection procedures |

##### Keywords:

adaptive fence; consistency; F-B fence; finite sample performance; GLMM; linear mixed model; model selection
PDF
BibTeX
XML
Cite

\textit{J. Jiang} et al., Ann. Stat. 36, No. 4, 1669--1692 (2008; Zbl 1142.62047)

**OpenURL**

##### References:

[1] | Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Csaki, eds.) 267-281. Akademiai Kiadó, Budapest. · Zbl 0283.62006 |

[2] | Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716-723. · Zbl 0314.62039 |

[3] | Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62 1198-1211. |

[4] | Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 80 28-36. |

[5] | Bozdogan, H. (1994). Editor’s general preface. In Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling : An Informational Approach (H. Bozdogan et al., eds.) 3 ix-xii. Kluwer Academic Publishers, Dordrecht. |

[6] | Datta, G. S. and Lahiri, P. (2001). Discussion on “Scales of evidence for model selection: Fisher versus Jeffreys,” by B. Efron and A. Gous. In Model Selection (P. Lahiri, ed.) 208-256. IMS, Beachwood, OH. |

[7] | de Leeuw, J. (1992). Introduction to Akaike (1973) Information theory and an extension of the maximum likelihood principle. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 1 599-609. Springer, London. |

[8] | Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of James-Stein procedure to census data. J. Amer. Statist. Assoc. 74 269-277. |

[9] | Fabrizi, E. and Lahiri, P. (2004). A new approximation to the Bayes information criterion in finite population sampling. Technical report, Dept. Mathematics, Univ. Maryland. |

[10] | Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 41 190-195. JSTOR: · Zbl 0408.62076 |

[11] | Hartley, H. O. and Rao, J. N. K. (1967). Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54 93-108. JSTOR: · Zbl 0178.22001 |

[12] | Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models . Chapman and Hall, London. · Zbl 0747.62061 |

[13] | Harville, D. A. (1977). Maximum likelihood approaches to variance components estimation and related problems. J. Amer. Statist. Assoc. 72 320-340. JSTOR: · Zbl 0373.62040 |

[14] | Hodges, J. S. and Sargent, D. J. (2001). Counting degrees of freedom in hierarchical and other richly-parameterised models. Biometrika 88 367-379. JSTOR: · Zbl 0984.62045 |

[15] | Jiang, J. and Zhang, W. (2001). Robust estimation in generalized linear mixed models. Biometrika 88 753-765. JSTOR: · Zbl 0985.62054 |

[16] | Jiang, J. and Rao, J. S. (2003). Consistent procedures for mixed linear model selection. Sankhyā 65 23-42. · Zbl 1193.62112 |

[17] | Jiang, J., Rao, J. S., Gu, Z. and Nguyen, T. (2006). Fence methods for mixed model selection. Technical report. Available at http://anson.ucdavis.edu/ jiang/jp10.r3.pdf. · Zbl 1142.62047 |

[18] | Lahiri, P., ed. (2001). Model Selection . IMS, Beachwood, OH. · Zbl 1029.00034 |

[19] | Meza, J. and Lahiri, P. (2005). A note on the C p statistic under the nested error regression model. Survey Methodology 31 105-109. |

[20] | Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of analysis of variance. Ann. Statist. 5 746-762. · Zbl 0406.62017 |

[21] | Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 758-765. · Zbl 0544.62063 |

[22] | Owen, A. (2007). The pigeonhole bootstrap. Ann. Appl. Statist. 1 386-411. · Zbl 1126.62027 |

[23] | Pebley, A. R., Goldman, N. and Rodriguez, G. (1996). Prenatal and delivery care and childhood immunization in Guatamala; do family and community matter? Demography 33 231-247. |

[24] | Rao, J. N. K. (2003). Small Area Estimation . Wiley, New York. · Zbl 1026.62003 |

[25] | Rodriguez, G. and Goldman, N. (2001). Improved estimation procedure for multilevel models with binary responses: A case-study. J. Roy. Statist. Soc. Ser. A 164 339-355. JSTOR: · Zbl 1002.62507 |

[26] | Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 |

[27] | Shibata, R. (1984). Approximate efficiency of a selection procedure for the number of regression variables. Biometrika 71 43-49. JSTOR: · Zbl 0543.62053 |

[28] | Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed effects models. Biometrika 92 351-370. · Zbl 1094.62077 |

[29] | Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120-131. JSTOR: · Zbl 0920.62056 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.