×

Hierarchical spatial models for predicting tree species assemblages across large domains. (English) Zbl 1196.62121

Summary: Spatially explicit data layers of tree species assemblages, referred to as forest types or forest type groups, are a key component in large-scale assessments of forest sustainability, biodiversity, timber biomass, carbon sinks and forest health monitoring. This paper explores the utility of coupling georeferenced national forest inventory (NFI) data with readily available and spatially complete environmental predictor variables through spatially-varying multinomial logistic regression models to predict forest type groups across large forested landscapes. These models exploit underlying spatial associations within the NFI plot array and the spatially-varying impact of predictor variables to improve the accuracy of forest type group predictions. The richness of these models incurs onerous computational burdens and we discuss dimension reducing spatial processes that retain the richness in modeling. We illustrate using NFI data from Michigan, USA, where we provide a comprehensive analysis of this large study area and demonstrate improved prediction with associated measures of uncertainty.

MSC:

62M30 Inference from spatial processes
62P12 Applications of statistics to environmental and related topics
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
65C60 Computational problems in statistics (MSC2010)
65C40 Numerical analysis or methods applied to Markov chains
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agresti, A. (2002). Categorical Data Analysis , 2nd ed. Wiley, New York. · Zbl 1018.62002
[2] Albert, D. A. (1995). Regional landscape ecosystems of Michigan, Minnesota, and Wisconsin: A working map and classification. Report No. Gen. Tech. Rep. NC-178. USDA Forest Service, North Central Forest Experiment Station, St. Paul, MN.
[3] Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial datasets. J. Roy. Statist. Soc. Ser. B 70 825-848. · Zbl 1533.62065 · doi:10.1111/j.1467-9868.2008.00663.x
[4] Bechtold, W. A. and Patterson, P. L. (2005). The enhanced forest inventory and analysis program-national sampling design and estimation procedures. In General Technical Report SRS-80 . USDA Forest Service, Southern Research Station 85, Asheville, NC.
[5] Begg, C. B. and Gray, R. (1984). Calculation of polytomous logistic regression parameters using individualized regressions. Biometrika 71 11-18. · Zbl 0533.62089 · doi:10.2307/2336391
[6] Crainiceanu, C. M., Diggle, P. J. and Rowlingson, B. (2008). Bivariate binomial spatial modeling of Loa loa prevalence in tropical Africa (with discussion). J. Amer. Statist. Assoc. 103 21-37. · Zbl 1469.86015 · doi:10.1198/016214507000001409
[7] Cressie, N. (1993). Statistics for Spatial Data , 2nd ed. Wiley, New York. · Zbl 0799.62002
[8] Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. Roy. Statist. Soc. Ser. B 70 209-226. · Zbl 05563351 · doi:10.1111/j.1467-9868.2007.00633.x
[9] Daly, C., Taylor, G. H., Gibson, W. P., Parzybok, T. W., Johnson, G. L. and Pasteris, P. A. (2000). High-quality spatial climate data sets for the United States and beyond. Transactions of the American Society of Agricultural and Biological Engineers 43 1957-1962.
[10] Daniels, M. J. and Kass, R. E. (1999). Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models. J. Amer. Statist. Assoc. 94 1254-1263. · Zbl 1069.62508 · doi:10.2307/2669939
[11] Diggle, P. J. and Lophaven, S. (2006). Bayesian geostatistical design. Scand. J. Statist. 33 53-64. · Zbl 1120.62112 · doi:10.1111/j.1467-9469.2005.00469.x
[12] Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Model-based geostatistics (with discussion). Appl. Statist. 47 299-350. · Zbl 0904.62119 · doi:10.1111/1467-9876.00113
[13] Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed models based on Markov random field priors. J. Roy. Statist. Soc. Ser. C 50 201-220. · doi:10.1111/1467-9876.00229
[14] Finley, A. O., Banerjee, S., Ek, A. R. and McRoberts, R. E. (2008a). Bayesian multivariate process modeling for prediction of forest attributes. Journal of Agricultural, Biological, and Environmental Statistics 13 60-83. · Zbl 1306.62272 · doi:10.1198/108571108X273160
[15] Finley, A. O., Banerjee, S. and McRoberts, R. E. (2008b). A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environ. Ecol. Statist. 15 241-258. · doi:10.1007/s10651-007-0049-5
[16] Finley, A. O., Banerjee, S. and McRoberts, R. E. (2009). Supplement to “Hierarchical spatial models for predicting tree species assemblages across large domains.” DOI: 10.1214/09-AOAS250SUPP. · Zbl 1196.62121 · doi:10.1214/09-AOAS250
[17] Fuentes, M. (2002). A new class of nonstationary spatial models. Biometrika 89 197-210. · Zbl 0997.62073 · doi:10.1093/biomet/89.1.197
[18] Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist. Assoc. 102 321-331. · Zbl 1284.62589 · doi:10.1198/016214506000000852
[19] Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Statist. 15 502-523. · doi:10.1198/106186006X132178
[20] Gaspari, G. and Cohn, S. E. (1999). Construction of correlation functions in two and three dimensions. The Quarterly Journal of the Royal Meteorological Society 125 723-757.
[21] Gelfand, A. E., Schmidt, A. M., Banerjee, S. and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion). Test 13 263-312. · Zbl 1069.62074 · doi:10.1007/BF02595775
[22] Gelfand, A. E., Kim, H., Sirmans, C. F. and Banerjee, S. (2003). Spatial modelling with spatially varying coefficient processes. J. Amer. Statist. Assoc. 98 387-396. · Zbl 1041.62041 · doi:10.1198/016214503000170
[23] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis , 2nd ed. Chapman and Hall/CRC Press, Boca Raton, FL. · Zbl 1039.62018
[24] Gneiting, T. (2002). Compactly supported correlation functions. J. Multivariate Anal. 83 493-508. · Zbl 1011.60015 · doi:10.1006/jmva.2001.2056
[25] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359-378. · Zbl 1284.62093 · doi:10.1198/016214506000001437
[26] Grzebyk, M. and Wackernagel, H. (1994). Multivariate analysis and spatial/temporal scales: Real and complex models. In Proceedings of the XVIIth International Biometrics Conference 19-33. Hamilton, Ontario.
[27] Harville, D. A. (1997). Matrix Algebra from a Statistician’s Perspective . Springer, New York. · Zbl 0881.15001
[28] Heagerty, P. J. and Lele, S. R. (1998). A composite likelihood approach to binary spatial data. J. Amer. Statist. Assoc. 93 1099-1111. · Zbl 1064.62528 · doi:10.2307/2669853
[29] Henne, P. D., Hu, F. S. and Cleland, D. T. (2007). Lake-effect snow as the dominant control of mesic-forest distribution in Michigan, USA. Journal of Ecology 95 517-529.
[30] Henderson, H. V. and Searle, S. R. (1981). On deriving the inverse of a sum of matrices. SIAM Review 23 53-60. · Zbl 0451.15005 · doi:10.1137/1023004
[31] Host, G. E., Pregitzer, K. S., Ramm, C. W., Lusch, D. P. and Cleland, D. T. (1988). Variation in overstory biomass among glacial landforms and ecological land units in northwestern Lower Michigan. Canadian Journal of Forest Research 18 659-668.
[32] Jones, R. H. and Zhang, Y. (1997). Models for continuous stationary space-time processes. In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications and Future Directions . (P. J. Diggle, W. G. Warren and R. D. Wolfinger, eds.). Springer, New York. · Zbl 0897.62103
[33] Kamman, E. E. and Wand, M. P. (2003). Geoadditive models. Appl. Statist. 52 1-18. · Zbl 1111.62346 · doi:10.1111/1467-9876.00385
[34] Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis . Wiley, New York. · Zbl 1345.62009
[35] Kneib, T. and Fahrmeir, L. (2006). Structured additive regression for categorical space-time data: A mixed model approach. Biometrics 62 109-118. · Zbl 1091.62077 · doi:10.1111/j.1541-0420.2005.00392.x
[36] Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R. and Klein, B. (2000). Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Ann. Statist. 28 1570-1600. · Zbl 1105.62358 · doi:10.1214/aos/1015957471
[37] McCulloch, R. E., Polson, N. G. and Rossi, P. E. (2000). A Bayesian analysis of the multinomial probit model with fully identified parameters. J. Econometrics 99 173-193. · Zbl 0958.62029 · doi:10.1016/S0304-4076(00)00034-8
[38] McRoberts, R. E., Nelson, M. D. and Wendt, D. G. (2002). Stratified estimation of forest area using satellite imagery, inventory data, and the k-Nearest Neighbors technique. Remote Sensing of Environment 82 457-468.
[39] Paciorek, C. (2007). Computational techniques for spatial logistic regression with large data sets. Comput. Statist. Data Anal. 51 3631-3653. · Zbl 1161.62437 · doi:10.1016/j.csda.2006.11.008
[40] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . MIT Press, Cambridge, MA. · Zbl 1177.68165
[41] Reich B. J. and Fuentes, M. (2007). A multivariate nonparametric Bayesian spatial frame-work for hurricane surface wind fields. Ann. Appl. Statist. 1 249-264. · Zbl 1129.62114 · doi:10.1214/07-AOAS108
[42] Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods , 2nd ed. Springer, New York. · Zbl 0935.62005
[43] Royle, J. A. and Nychka, D. (1998). An algorithm for the construction of spatial coverage designs with implementation in SPLUS. Computers and Geosciences 24 479-488.
[44] Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion). J. Roy. Statist. Soc. Ser. B 71 1-35. · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x
[45] Ruppert, D., Wand, M. P. and Caroll, R. J. (2003). Semiparametric Regressgion . Cambridge Univ. Press.
[46] Schaetzl, R. J. (1986). A soilscape analysis of contrasting glacial terrains in Wisconsin. Ann. Assoc. Amer. Geographers 76 414-425.
[47] Schmidt, A. and Gelfand, A. E. (2003). A Bayesian coregionalization model for multivariate pollutant data. Journal of Geophysics Research-Atmospheres 108 8783.
[48] Stage, A. R. (1969). A growth definition for stocking: Units, sampling, and interpretation. Forest Science 15 255-275.
[49] Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory of Kriging . Springer, New York. · Zbl 0924.62100
[50] Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Statist. 1 191-210. · Zbl 1129.62115 · doi:10.1214/07-AOAS106
[51] Stein, M. L. (2008). A modeling approach for large spatial datasets. J. Korean Statist. Soc. 37 3-10. · Zbl 1196.62123 · doi:10.1016/j.jkss.2007.09.001
[52] Stein, M. L., Chi, Z. and Welty, L. J. (2004). Approximating likelihoods for large spatial datasets. J. Roy. Statist. Soc. Ser. B 66 275-296. · Zbl 1062.62094 · doi:10.1046/j.1369-7412.2003.05512.x
[53] Tomppo, E. and Halme, M. (2004). Using coarse scale forest variables as ancillary information and weighting of variables in k-NN estimation: A genetic algorithm approach. Remote Sensing of Environment 92 1-20.
[54] Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. J. Roy. Statist. Soc. Ser. B 50 297-312.
[55] Ver Hoef, J. M. and Barry, R. D. (1998). Modelling crossvariograms for cokriging and multivariable spatial prediction. J. Statist. Plann. Inference 69 275-294. · Zbl 0935.62110 · doi:10.1016/S0378-3758(97)00162-6
[56] Wackernagel, H. (2006). Multivariate Geostatistics: An Introduction with Applications , 3rd ed. Springer, New York. · Zbl 0912.62131
[57] Wahba, G. (1990). Spline Models for Observational Data . SIAM, Philadelphia. · Zbl 0813.62001
[58] Zhu, Z. and Stein, M. L. (2006). Spatial sampling design for prediction with estimated parameters. J. Agric. Biol. Environ. Statist. 11 24-49.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.