×

A generalized estimating equation approach to multivariate adaptive regression splines. (English) Zbl 07498982

Summary: Multivariate adaptive regression splines (MARS) is a popular nonparametric regression tool often used for prediction and for uncovering important data patterns between the response and predictor variables. The standard MARS algorithm assumes responses are normally distributed and independent, but in this article we relax both of these assumptions by extending MARS to generalized estimating equations. We refer to this MARS-for-GEEs algorithm as “MARGE.” Our algorithm makes use of fast forward selection techniques, such that in the univariate case, MARGE has similar computation speed to a standard MARS implementation. Through simulation we show that the proposed algorithm has improved predictive performance than the original MARS algorithm when using correlated and/or nonnormal response data. MARGE is also competitive with alternatives in the literature, especially for problems with multiple interacting predictors. We apply MARGE to various ecological examples with different data types. Supplementary material for this article is available online.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bakin, S.; Hegland, M.; Osborne, M. R., Parallel MARS Algorithm Based on B-splines, Computational Statistics, 15, 463-484 (2000) · Zbl 1038.65008
[2] Boos, D. D., On Generalized Score Tests, The American Statistician, 46, 327-333 (1992)
[3] Brown, A. M.; Warton, D. I.; Andrew, N. R.; Binns, M.; Cassis, G.; Gibb, H., The Fourth-corner Solution - Using Predictive Models to Understand How Species Traits Interact with the Environment, Methods in Ecology and Evolution, 5, 344-352 (2014)
[4] Buja, A.; Duffy, D.; Hastie, T. J.; Tibshirani, R., Discussion: Multivariate Adaptive Regression Splines, Annals of Statistics, 19, 93-99 (1991)
[5] Calcagno, V.; de Mazancourt, C., glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models, Journal of Statistical Software, 34, 1-29 (2010)
[6] Chatterjee, S.; Laudato, M.; Lynch, L. A., Genetic Algorithms and their Statistical Applications: An Introduction, Computational Statistics & Data Analysis, 22, 633-651 (1996) · Zbl 0900.62336
[7] Chipman, H.; George, E.; McCulloch, R., Bayesian Ensemble Learning, Advances in Neural Information Processing Systems, 19, 265-272 (2006)
[8] ———, Bayesian Additive Regression Trees, The Annals of Applied Statistics, 4, 266-298 (2010) · Zbl 1189.62066
[9] Chipman, H.; McCulloch, R., BayesTree: Bayesian Additive Regression Trees, R Package Version 0.3-1.2 (2015)
[10] Faraway, J. J., Linear Models with R (2014), Boca Raton, FL: Chapman & Hall/CRC, Boca Raton, FL · Zbl 1341.62008
[11] Friedman, J. H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 19, 1-67 (1991) · Zbl 0765.62064
[12] Fast MARS (1993)
[13] Friedman, J. H.; Roosen, C. B., An introduction to Multivariate Adaptive Regression Splines, Statistical Methods in Medical Research, 4, 197-217 (1995)
[14] Friedman, J. H.; Silverman, B. H., Flexible Parsimonious Smoothing and Additive Modeling, Technometrics, 31, 3-21 (1989) · Zbl 0672.65119
[15] Hardin, J. W.; Hilbe, J. M., Generalized Estimating Equations (2002), Boca Raton, FL: Chapman & Hall/CRC, Boca Raton, FL
[16] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2009), New York: Springer-Verlag, New York · Zbl 1273.62005
[17] Koc, E. K.; Iyigun, C.; Batmaz, I.; Weber, G. H., Efficient Adaptive Regression Spline Algorithms Based on Mapping Approach With a Case Study on Finance, Journal of Global Optimization, 60, 103-120 (2014) · Zbl 1308.65018
[18] Leathwick, J. R.; Rowe, D.; Richardson, J.; Elith, J.; Hastie, T., Using Multivariate Adaptive Regression Splines to Predict the Distributions of New Zealand’s Freshwater Diadromous Fish, Freshwater Biology, 50, 2034-2052 (2005)
[19] Liang, K. Y.; Zeger, S. L., Longitudinal Data Analysis Using Generalized Linear Models, Biometrika, 73, 13-22 (1986) · Zbl 0595.62110
[20] Lin, H.-Y.; Wang, W.; Liu, Y.-H.; Soong, S.-J.; York, T. P.; Myers, L.; Hu, J. J., Comparison of Multivariate Adaptive Regression Splines and Logistic Regression in Detecting SNPSNP Interactions and their Application in Prostate Cancer, Journal of Human Genetics, 53, 802-811 (2008)
[21] McCullagh, P.; Nelder, J. A., Generalized Linear Models (1989), London: Chapman & Hall/CRC, London · Zbl 0744.62098
[22] Milborrow, S., Notes on the earth Package. Package vignette (2017)
[23] earth: Multivariate Adaptive Regression Splines. R Package Version 4.4.7 (2017)
[24] Nuamah, I. F.; Qu, Y.; Amini, S. B., A SAS Macro for Stepwise Correlated Binary Regression, Computer Methods and Programs in Biomedicine, 49, 199-210 (1996)
[25] Pregibon, D., Score Tests in GLIM With Applications, Lecture Notes in Statistics. GLIM 82: Proceedings of the International Conference on Generalised Linear Models, 87-97 (1982) · Zbl 0493.62063
[26] R: A Language and Environment for Statistical Computing (2017)
[27] Rotnitzky, A.; Jewell, N. P., Hypothesis Testing of Regression Parameters in Semiparametric Generalized Linear Models for Cluster Correlated Data, Biometrika, 77, 485-497 (1990) · Zbl 0734.62075
[28] Ruppert, D.; Wand, M. P.; Carroll, R. J., Semiparametric Regression (2003), Cambridge: Cambridge University Press, Cambridge · Zbl 1038.62042
[29] SAS Software, Version 9.1 (2015)
[30] Stoklosa, J.; Gibb, H.; Warton, D. I., Fast Forward Selection for Generalized Estimating Equations With a Large Number of Predictor Variables, Biometrics, 70, 110-120 (2014) · Zbl 1419.62177
[31] Stone, C. J.; Hansen, M. H.; Kooperberg, C.; Truong, Y. K., Polynomial Splines and their Tensor Products in Extended Linear Modeling, The Annals of Statistics, 25, 1371-1425 (1997) · Zbl 0924.62036
[32] Wang, Y.; Naumann, U.; Wright, S. T.; Warton, D. I., mvabund – an R package for Model-based Analysis of Multivariate Abundance Data, Methods in Ecology and Evolution, 3, 471-474 (2012)
[33] Warton, D. I.; Wright, S. T.; Wang, Y., Distance-based Multivariate Analyses Confound Location and Dispersion Effects, Methods in Ecology and Evolution, 3, 89-101 (2012)
[34] Wood, S. N., Generalized Additive Models: An Introduction with R (2006), Boca Raton, FL: Chapman & Hall/CRC, Boca Raton, FL · Zbl 1087.62082
[35] York, T. P.; Eaves, L. J.; Oord, E. J. C. J., Multivariate Adaptive Regression Splines: A Powerful Method for Detecting Disease-Risk Relationship Differences Among Subgroups, Statistics in Medicine, 25, 1355-1367 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.