Distributional regression forests for probabilistic precipitation forecasting in complex terrain. (English) Zbl 1433.62325

Summary: To obtain a probabilistic model for a dependent variable based on some set of explanatory variables, a distributional approach is often adopted where the parameters of the distribution are linked to regressors. In many classical models this only captures the location of the distribution but over the last decade there has been increasing interest in distributional regression approaches modeling all parameters including location, scale and shape. Notably, so-called nonhomogeneous Gaussian regression (NGR) models both mean and variance of a Gaussian response and is particularly popular in weather forecasting. Moreover, generalized additive models for location, scale and shape (GAMLSS) provide a framework where each distribution parameter is modeled separately capturing smooth linear or nonlinear effects. However, when variable selection is required and/or there are nonsmooth dependencies or interactions (especially unknown or of high-order), it is challenging to establish a good GAMLSS. A natural alternative in these situations would be the application of regression trees or random forests but, so far, no general distributional framework is available for these. Therefore, a framework for distributional regression trees and forests is proposed that blends regression trees and random forests with classical distributions from the GAMLSS framework as well as their censored or truncated counterparts. To illustrate these novel approaches in practice, they are employed to obtain probabilistic precipitation forecasts at numerous sites in a mountainous region (Tyrol, Austria) based on a large number of numerical weather prediction quantities. It is shown that the novel distributional regression forests automatically select variables and interactions, performing on par or often even better than GAMLSS specified either through prior meteorological knowledge or a computationally more demanding boosting approach.


62P12 Applications of statistics to environmental and related topics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62F03 Parametric hypothesis testing
Full Text: DOI arXiv Euclid


[1] Athey, S., Tibshirani, J. and Wager, S. (2019). Generalized random forests. Ann. Statist. 47 1148-1178. · Zbl 1418.62102 · doi:10.1214/18-AOS1709
[2] Baran, S. and Nemoda, D. (2016). Censored and shifted gamma distribution based EMOS model for probabilistic quantitative precipitation forecasting. Environmetrics 27 280-292.
[3] Bauer, P., Thorpe, A. and Brunet, G. (2015). The quiet revolution of numerical weather prediction. Nature 525 (7567) 47-55.
[4] Biau, G. and Scornet, E. (2016). A random forest guided tour. TEST 25 197-227. · Zbl 1402.62133 · doi:10.1007/s11749-016-0481-7
[5] BMLFUW (2016). Bundesministerium für Land und Forstwirtschaft, Umwelt und Wasserwirtschaft (BMLFUW), Abteilung IV/4—Wasserhaushalt. Available at http://ehyd.gv.at/. Accessed: 2016-02-29.
[6] Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. (With discussion). J. Roy. Statist. Soc. Ser. B 26 211-252. · Zbl 0156.40104 · doi:10.1111/j.2517-6161.1964.tb00553.x
[7] Breiman, L. (2001). Random forests. Mach. Learn. 45 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[8] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Statistics/Probability Series. Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042
[9] Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. J. Comput. Graph. Statist. 5 236-244.
[10] Gebetsberger, M., Messner, J. W., Mayr, G. J. and Zeileis, A. (2017). Fine-tuning non-homogeneous regression for probabilistic precipitation forecasts: Unanimous predictions, heavy tails, and link functions. Mon. Weather Rev. 145 4693-4708.
[11] Glahn, H. R. and Lowry, D. A. (1972). The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteorol. 11 1203-1211.
[12] Gneiting, T., Balabdaoui, F. and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 243-268. · Zbl 1120.62074 · doi:10.1111/j.1467-9868.2007.00587.x
[13] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359-378. · Zbl 1284.62093 · doi:10.1198/016214506000001437
[14] Gneiting, T., Raftery, A. E., Westveld III, A. H. and Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133 1098-1118. · Zbl 1284.62093 · doi:10.1198/016214506000001437
[15] Goicoa, T., Adin, A., Ugarte, M. D. and Hodges, J. S. (2018). In spatio-temporal disease mapping models, identifiability constraints affect PQL and INLA results. Stoch. Environ. Res. Risk Assess. 32 749-770.
[16] Hamill, T. M., Bates, G. T., Whitaker, J. S., Murray, D. R., Fiorino, M., Galarneau Jr., T. J., Zhu, Y. and Lapenta, W. (2013). NOAA’s second-generation global medium-range ensemble reforecast dataset. Bull. Am. Meteorol. Soc. 94 1553-1565.
[17] Hastie, T. and Tibshirani, R. (1986). Generalized additive models. Statist. Sci. 1 297-318. · Zbl 0645.62068 · doi:10.1214/ss/1177013604
[18] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York. · Zbl 0973.62007
[19] Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15 559-570.
[20] Hofner, B., Mayr, A. and Schmid, M. (2016). gamboostLSS: An R package for model building and variable selection in the GAMLSS framework. J. Stat. Softw. 74 (1) 1-31.
[21] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651-674.
[22] Hothorn, T. and Zeileis, A. (2017). Transformation forests. Available at arXiv:1701.02110.
[23] Hothorn, T., Lausen, B., Benner, A. and Radespiel-Tröger, M. (2004). Bagging survival trees. Stat. Med. 23 77-91.
[24] Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2006). A Lego system for conditional inference. Amer. Statist. 60 257-263.
[25] Hutchinson, M. F. (1998). Interpolation of rainfall data with thin plate smoothing splines—Part II: Analysis of topographic dependence. Journal of Geographic Information and Decision Analysis 2 152-167.
[26] Klein, N., Kneib, T., Lang, S. and Sohn, A. (2015). Bayesian structured additive distributional regression with an application to regional income inequality in Germany. Ann. Appl. Stat. 9 1024-1052. · Zbl 1454.62485 · doi:10.1214/15-AOAS823
[27] Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc. 101 578-590. · Zbl 1119.62304 · doi:10.1198/016214505000001230
[28] Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Sage Publications, Thousand Oaks, CA. · Zbl 0911.62055
[29] Meinshausen, N. (2006). Quantile regression forests. J. Mach. Learn. Res. 7 983-999. · Zbl 1222.68262
[30] Messner, J. W., Mayr, G. J. and Zeileis, A. (2016). Heteroscedastic censored and truncated regression with crch. The R Journal 8 (1) 173-181.
[31] Messner, J. W., Mayr, G. J. and Zeileis, A. (2017). Non-homogeneous boosting for predictor selection in ensemble post-processing. Mon. Weather Rev. 145 137-147.
[32] Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. J. R. Stat. Soc. Ser. A 135 370-384.
[33] Rasp, S. and Lerch, S. (2018). Neural networks for post-processing ensemble weather forecasts. Mon. Weather Rev. 146 3885-3900.
[34] Rigby, R. A. and Stasinopoulos, D. M. (2005a). Generalized additive models for location, scale and shape. J. R. Stat. Soc. Ser. C. Appl. Stat. 54 507-554. · Zbl 1490.62201 · doi:10.1111/j.1467-9876.2005.00510.x
[35] Robinson, N., Regetz, J. and Guralnick, R. P. (2014). EarthEnv-DEM90: A nearly-global, void-free, multi-scale smoothed, 90m digital elevation model from fused ASTER and SRTM data. ISPRS J. Photogramm. Remote Sens. 87 57-67.
[36] Scheuerer, M. and Hamill, T. M. (2015). Statistical post-processing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Weather Rev. 143 4578-4596.
[37] Schlosser, L., Hothorn, T., Stauffer, R. and Zeileis, A. (2019a). Different response distributions. Supplement A to “Distributional regression forests for probabilistic precipitation forecasting in complex terrain.” DOI:10.1214/19-AOAS1247SUPPA. · Zbl 1433.62325
[38] Schlosser, L., Hothorn, T., Stauffer, R. and Zeileis, A. (2019b). Stationwise evaluation. Supplement B to “Distributional regression forests for probabilistic precipitation forecasting in complex terrain.” DOI:10.1214/19-AOAS1247SUPPB. · Zbl 1433.62325
[39] Stasinopoulos, D. M. and Rigby, R. A. (2007). Generalized additive models for location scale and shape (GAMLSS) in R. J. Stat. Softw. 23 (7) 1-46.
[40] Stauffer, R., Umlauf, N., Messner, J. W., Mayr, G. J. and Zeileis, A. (2017a). Ensemble post-processing of daily precipitation sums over complex terrain using censored high-resolution standardized anomalies. Mon. Weather Rev. 45 955-969.
[41] Stauffer, R., Mayr, G. J., Messner, J. W., Umlauf, N. and Zeileis, A. (2017b). Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model. Int. J. Climatol. 37 3264-3275.
[42] Stidd, C. K. (1973). Estimating the precipitation climate. Water Resour. Res. 9 1235-1241.
[43] Strasser, H. and Weber, Ch. (1999). The asymptotic theory of permutation statistics. Math. Methods Statist. 8 220-250. Johann Pfanzagl—On the occasion of his 70th birthday. · Zbl 1103.62346
[44] Ugarte, M. D., Adin, A. and Goicoa, T. (2017b). One-dimensional, two-dimensional, and three dimensional B-splines to specify space-time interactions in Bayesian disease mapping: Model fitting and model identifiability. Spat. Stat. 22 451-468.
[45] Wood, S. N., Scheipl, F. and Faraway, J. J. (2013). Straightforward intermediate rank tensor product smoothing in mixed models. Stat. Comput. 23 341-360. · Zbl 1322.62197 · doi:10.1007/s11222-012-9314-z
[46] Zeileis, A. and Hornik, K. (2007). Generalized \(M\)-fluctuation tests for parameter instability. Stat. Neerl. 61 488-508. · Zbl 1152.62014 · doi:10.1111/j.1467-9574.2007.00371.x
[47] Zeileis, A., Hothorn, T. and Hornik, K. (2008). Model-based recursive partitioning. J. Comput. Graph. Statist. 17 492-514.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.