×

Efficient estimation of variance components in nonparametric mixed-effects models with large samples. (English) Zbl 1505.62184

Summary: Linear mixed-effects (LME) regression models are a popular approach for analyzing correlated data. Nonparametric extensions of the LME regression model have been proposed, but the heavy computational cost makes these extensions impractical for analyzing large samples. In particular, simultaneous estimation of the variance components and smoothing parameters poses a computational challenge when working with large samples. To overcome this computational burden, we propose a two-stage estimation procedure for fitting nonparametric mixed-effects regression models. Our results reveal that, compared to currently popular approaches, our two-stage approach produces more accurate estimates that can be computed in a fraction of the time.

MSC:

62-08 Computational methods for problems pertaining to statistics
62G08 Nonparametric regression and quantile regression
62J10 Analysis of variance and covariance (ANOVA)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013) · Zbl 0918.62039
[2] Bates, D., Maechler, M., Bolker, B., Walker, S.: lme4: Linear mixed-effects models using Eigen and S4. http://CRAN.R-project.org/package=lme4, r package version 1.1-8 (2015) · Zbl 1117.62445
[3] Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Mathe. 31, 377-403 (1979) · Zbl 0377.65007 · doi:10.1007/BF01404567
[4] Leeuw, J.; Meijer, E.; Leeuw, J. (ed.); Meijer, E. (ed.), Introduction to multilevel analysis, 1-75 (2008), New York · doi:10.1007/978-0-387-73186-5_1
[5] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39, 1-38 (1977) · Zbl 0364.62022
[6] Fu, W.J.: Penalized estimating equations. Biometrics 59, 126-132 (2003) · Zbl 1210.62016 · doi:10.1111/1541-0420.00015
[7] Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information reml: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440-1450 (1995) · Zbl 0875.62314 · doi:10.2307/2533274
[8] Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, West Sussex (2011) · Zbl 1274.62006
[9] Gu, C.: Smoothing Spline ANOVA Models, 2nd edn. Springer, New York (2013) · Zbl 1269.62040 · doi:10.1007/978-1-4614-5369-7
[10] Gu, C.: gss: General smoothing splines. http://CRAN.R-project.org/package=gss, r package version 2.1-5 (2014)
[11] Gu, C., Ma, P.: Optimal smoothing in nonparametric mixed-effect models. Annal. Stat. 33, 1357-1379 (2005) · Zbl 1072.62027 · doi:10.1214/009053605000000110
[12] Hartley, H.O., Rao, J.N.K.: Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54, 93-108 (1967) · Zbl 0178.22001 · doi:10.1093/biomet/54.1-2.93
[13] Harville, D.A.: Maximum-likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72, 320-340 (1977) · Zbl 0373.62040 · doi:10.1080/01621459.1977.10480998
[14] Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall/CRC, New York (1990) · Zbl 0747.62061
[15] Helwig, N.E.: bigsplines: smoothing splines for large samples. http://CRAN.R-project.org/package=bigsplines, r package version 1.0-7 (2015) · Zbl 0065.24603
[16] Helwig, N.E., Ma, P.: Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. J. Comput. Graph. Stat. 24, 715-732 (2015) · doi:10.1080/10618600.2014.926819
[17] Helwig, N.E., Ma, P.: (in press) Smoothing spline ANOVA for super-large samples: scalable computation via rounding parameters. Statistics and Its Interface · Zbl 1405.62040
[18] Henderson, C.R.: Estimation of genetic parameters (abstract). Annal. Math. Stat. 21, 309-310 (1950)
[19] Henderson, C.R.: Estimation of variance and covariance components. Biometrics 9, 226-252 (1953) · doi:10.2307/3001853
[20] Henderson, C.R., Kempthorne, O., Searle, S.R., von Krosigk, C.M.: The estimation of environmental and genetic trends from records subject to culling. Biometrics 15, 192-218 (1959) · Zbl 0128.40301 · doi:10.2307/2527669
[21] Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Rev. 23, 53-60 (1981) · Zbl 0451.15005 · doi:10.1137/1023004
[22] Kenward, M., Molenberghs, G.: Likelihood based frequentist inference when data are missing at random. Stat. Sci. 12, 236-247 (1998) · Zbl 1099.62503
[23] Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. Ser. B 66, 337-356 (2004) · Zbl 1062.62067 · doi:10.1046/j.1369-7412.2003.05316.x
[24] Laird, N.M.: Computation of variance components using the EM algorithm. J. Stat. Comput. Simul. 14, 295-303 (1982) · Zbl 0478.65084 · doi:10.1080/00949658208810550
[25] Li, K.C.: Asymptotic optimality for \[{C}_p\] Cp, \[{C}_L\] CL, cross-validation and generalized cross-validation: discrete index set. Annal. Stat. 15, 958-975 (1987) · Zbl 0653.62037 · doi:10.1214/aos/1176350486
[26] Ma, P., Huang, J., Zhang, N.: Efficient computation of smoothing splines via adaptive basis sampling. Biometrika 102, 631-645 (2015) · Zbl 1452.62286 · doi:10.1093/biomet/asv009
[27] Moore, E.H.: On the reciprocal of the general algebraic matrix. Bull. Am. Mathe. Soc. 26, 394-395 (1920)
[28] Paterson, L.: Socio-economic status and educational attainment: a multidimensional and multilevel studys. Eval. Res. Educ. 5, 97-121 (1991) · doi:10.1080/09500799109533303
[29] Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545-554 (1971) · Zbl 0228.62046 · doi:10.1093/biomet/58.3.545
[30] Penrose, R.: A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51, 406-413 (1950) · Zbl 0065.24603 · doi:10.1017/S0305004100030401
[31] Porjesz, B., Begleiter, H.: Event-related potentials for individuals at risk for alcoholism. Alcohol 7, 465-469 (1990a) · doi:10.1016/0741-8329(90)90033-9
[32] Porjesz, B., Begleiter, H.: Neuroelectric processes in individuals at risk for alcoholism. Alcohol Alcohol. 25, 251-256 (1990b)
[33] Porjesz, B.; Begleiter, H.; Garozzo, R.; Begleiter, H. (ed.), Visual evoked potential correlates of information deficits in chronic alcoholics, 603-623 (1980), New York · doi:10.1007/978-1-4684-3632-7_46
[34] Porjesz, B., Begleiter, H., Bihari, B., Kissin, B.: The N2 component of the event-related brain potential in abstinent alcoholics. Electroencephalogr. Clin. Neurophysiol. 66, 121-131 (1987) · doi:10.1016/0013-4694(87)90181-7
[35] Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003) · Zbl 1038.62042 · doi:10.1017/CBO9780511755453
[36] Searle, S.E.: Applying the EM algorithm to calculating ML and REML estimates of variance components. Technical report BU-1213-M, Biometrics Unit, Cornell University (1993)
[37] Verbeke, G., Molenberghs, G.: Linear Mixed Models for Longitudinal Data. Springer, New York (2000) · Zbl 0956.62055
[38] Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. Annal. Stat. 4, 1378-1402 (1985) · Zbl 0596.65004 · doi:10.1214/aos/1176349743
[39] Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990) · Zbl 0813.62001 · doi:10.1137/1.9781611970128
[40] Wang, Y.: Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc, Ser. B 60, 159-174 (1998a) · Zbl 0909.62034 · doi:10.1111/1467-9868.00115
[41] Wang, Y.: Smoothing spline models with correlated random errors. J. Am. Stat. Assoc. 93, 341-348 (1998b) · Zbl 1068.62512 · doi:10.1080/01621459.1998.10474115
[42] Wang, Y., Ke, C.: assist: A Suite of S functions for Implementing Spline smoothing Techniques. http://CRAN.R-project.org/package=assist, r package version 3.1.2 (2013)
[43] Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673-686 (2004) · Zbl 1117.62445 · doi:10.1198/016214504000000980
[44] Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton (2006) · Zbl 1087.62082
[45] Wood, S.N.: mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation and GAMMs by REML/PQL. http://CRAN.R-project.org/package=mgcv, r package version 1.8-7 (2015) · Zbl 0918.62039
[46] Zhang, D., Lin, X., Raz, J., Sowers, M.: Semiparametric stochastic mixed models for longitudinal data. J. Am. Stat. Assoc. 93, 710-719 (1998) · Zbl 0918.62039 · doi:10.1080/01621459.1998.10473723
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.