Arbitrage of forecasting experts.

*(English)*Zbl 07073612Summary: Forecasting is an important task across several domains. Its generalised interest is related to the uncertainty and complex evolving structure of time series. Forecasting methods are typically designed to cope with temporal dependencies among observations, but it is widely accepted that none is universally applicable. Therefore, a common solution to these tasks is to combine the opinion of a diverse set of forecasts. In this paper we present an approach based on arbitrating, in which several forecasting models are dynamically combined to obtain predictions. Arbitrating is a metalearning approach that combines the output of experts according to predictions of the loss that they will incur. We present an approach for retrieving out-of-bag predictions that significantly improves its data efficiency. Finally, since diversity is a fundamental component in ensemble methods, we propose a method for explicitly handling the inter-dependence between experts when aggregating their predictions. Results from extensive empirical experiments provide evidence of the method’s competitiveness relative to state of the art approaches. The proposed method is publicly available in a software package.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

dynamic ensembles; metalearning; time series; combining expert advice; forecasting; dependency and diversity##### Software:

Cubist; earth; forecast; Forecast; gbm; glmnet; Kernlab; MASS (R); opera; pls; R; ranger; TDSL; TSDL; UCI-ml
PDF
BibTeX
XML
Cite

\textit{V. Cerqueira} et al., Mach. Learn. 108, No. 6, 913--944 (2019; Zbl 07073612)

Full Text:
DOI

##### References:

[1] | Aiolfi, M.; Timmermann, A., Persistence in forecasting performance and conditional combination strategies, Journal of Econometrics, 135, 31-53, (2006) · Zbl 1418.62406 |

[2] | Benavoli, A.; Corani, G.; Demšar, J.; Zaffalon, M., Time for a change: A tutorial for comparing multiple classifiers through bayesian analysis, The Journal of Machine Learning Research, 18, 2653-2688, (2017) · Zbl 1440.62237 |

[3] | Brazdil, P., Carrier, C. G., Soares, C., & Vilalta, R. (2008). Metalearning: Applications to data mining. Berlin: Springer. · Zbl 1173.68625 |

[4] | Breiman, L., Bagging predictors, Machine Learning, 24, 123-140, (1996) · Zbl 0858.68080 |

[5] | Brown, G., An information theoretic perspective on multiple classifier systems, 344-353, (2009), Berlin |

[6] | Brown, G.; Wyatt, J.; Harris, R.; Yao, X., Diversity creation methods: A survey and categorisation, Information Fusion, 6, 5-20, (2005) |

[7] | Brown, G.; Wyatt, JL; Tiňo , P., Managing diversity in regression ensembles, Journal of Machine Learning Research, 6, 1621-1650, (2005) · Zbl 1222.68154 |

[8] | Carbonell, J., & Goldstein, J. (1998). The use of mmr, diversity-based reranking for reordering documents and producing summaries (pp. 335-336). ACM. |

[9] | Carpenter, GA; Grossberg, S.; Reynolds, JH, Artmap: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network, Neural Networks, 4, 565-588, (1991) |

[10] | Cerqueira, V., Torgo, L., Pinto, F., & Soares, C. (2017). Arbitrated ensemble for time series forecasting. In Joint European conference on machine learning and knowledge discovery in databases (pp. 478-494). Springer. |

[11] | Cerqueira, V., Torgo, L., Smailović, J., Mozetič, I. (2017). A comparative study of performance estimation methods for time series forecasting. In proceedings of the 4th international conference on on data science and advanced analytics (pp. 529-538). IEEE. https://doi.org/10.1109/DSAA.2017.7. |

[12] | Cerqueira, V.; Torgo, L.; Soares, C., Arbitrated ensemble for solar radiation forecasting, 720-732, (2017), Cham |

[13] | Cesa-Bianchi, N.; Lugosi, G., Potential-based algorithms in on-line prediction and game theory, Machine Learning, 51, 239-261, (2003) · Zbl 1026.68152 |

[14] | Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press. · Zbl 1114.91001 |

[15] | Clemen, RT, Combining forecasts: A review and annotated bibliography, International Journal of Forecasting, 5, 559-583, (1989) |

[16] | Clemen, RT; Winkler, RL, Combining economic forecasts, Journal of Business and Economic Statistics, 4, 39-46, (1986) |

[17] | Dawid, A. P. (1984). Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society. Series A (General), 147(2), 278-292. |

[18] | Livera, AM; Hyndman, RJ; Snyder, RD, Forecasting time series with complex seasonal patterns using exponential smoothing, Journal of the American Statistical Association, 106, 1513-1527, (2011) · Zbl 1234.62123 |

[19] | Dietterich, T. G., & Bakiri, G. (1991). Error-correcting output codes: A general method for improving multiclass inductive learning programs. In AAAI (pp. 572-577). |

[20] | Fawcett, T., & Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 53-62). ACM. |

[21] | Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 33, 1-22, (2010) |

[22] | Gaillard, P., & Goude, Y. (2015). Forecasting electricity consumption by aggregating experts; how to design a good set of experts. In Modeling and stochastic learning for forecasting in high dimensions (pp. 95-115). Springer. |

[23] | Gaillard, P., & Goude, Y. (2016) opera: Online prediction by expert aggregation. R package version 1.0. https://CRAN.R-project.org/package=opera. |

[24] | Gama, J.; Kosina, P., Recurrent concepts in data streams classification, Knowledge and Information Systems, 40, 489-507, (2014) |

[25] | Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A., A survey on concept drift adaptation, ACM Computing Surveys (CSUR), 46, 44, (2014) · Zbl 1305.68141 |

[26] | Genre, V.; Kenny, G.; Meyler, A.; Timmermann, A., Combining expert forecasts: Can anything beat the simple average?, International Journal of Forecasting, 29, 108-121, (2013) |

[27] | Herbster, M.; Warmuth, MK, Tracking the best expert, Machine Learning, 32, 151-178, (1998) · Zbl 0912.68165 |

[28] | Hyndman, R. (2017). Time series data library. http://data.is/TSDLdemo. Accessed 11 December 2017. |

[29] | Hyndman, R. J. (2014). With contributions from George Athanasopoulos, Razbash, S., Schmidt, D., Zhou, Z., Khan, Y., Bergmeir, C., Wang, E.: forecast: Forecasting functions for time series and linear models. R package version 5.6. |

[30] | Jacobs, R., Methods for combining experts’ probability assessments, Neural Computation, 7, 867-888, (1995) |

[31] | Jacobs, RA; Jordan, MI; Nowlan, SJ; Hinton, GE, Adaptive mixtures of local experts, Neural Computation, 3, 79-87, (1991) |

[32] | Jose, VRR; Winkler, RL, Simple robust averages of forecasts: Some empirical results, International Journal of Forecasting, 24, 163-169, (2008) |

[33] | Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A., kernlab—An S4 package for kernel methods in R, Journal of Statistical Software, 11, 1-20, (2004) |

[34] | Kennel, MB; Brown, R.; Abarbanel, HD, Determining embedding dimension for phase-space reconstruction using a geometrical construction, Physical Review A, 45, 3403, (1992) |

[35] | Koprinska, I., Rana, M., & Agelidis, V. G. (2011). Yearly and seasonal models for electricity load forecasting. The 2011 international joint conference on neural networks (IJCNN) (pp. 1474-1481). IEEE. |

[36] | Kuhn, M., Weston, S., & Keefer, C. (2014). Code for Cubist by Ross Quinlan, N.C.C.: Cubist: Rule- and Instance-Based Regression Modeling. R package version 0.0.18. |

[37] | Kuncheva, L. I. (2004). Multiple classifier systems: 5th International workshop, MCS 2004, Cagliari, Italy, June 9-11, 2004. Proceedings, chap. Classifier ensembles for changing environments (pp. 1-15). Berlin: Springer. https://doi.org/10.1007/978-3-540-25966-4_1. |

[38] | Kwiatkowski, D.; Phillips, PC; Schmidt, P.; Shin, Y., Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?, Journal of Econometrics, 54, 159-178, (1992) · Zbl 0871.62100 |

[39] | Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 30 Aug 2017. |

[40] | Makridakis, S.; Andersen, A.; Carbone, R.; Fildes, R.; Hibon, M.; Lewandowski, R.; Newton, J.; Parzen, E.; Winkler, R., The accuracy of extrapolation (time series) methods: Results of a forecasting competition, Journal of Forecasting, 1, 111-153, (1982) |

[41] | Mevik, B. H., Wehrens, R., & Liland, K. H. (2016). pls: Partial least squares and principal component regression. R package version 2.6-0. https://CRAN.R-project.org/package=pls. |

[42] | Milborrow, S. (2012). Earth: Multivariate adaptive regression spline models. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. |

[43] | Newbold, P., & Granger, C. W. (1974). Experience with forecasting univariate time series and the combination of forecasts. Journal of the Royal Statistical Society. Series A (General), 137(2), 131-165. |

[44] | Ortega, J.; Koppel, M.; Argamon, S., Arbitrating among competing classifiers using learned referees, Knowledge and Information Systems, 3, 470-490, (2001) · Zbl 0987.68629 |

[45] | Pinto, F., Soares, C., & Mendes-Moreira, J. (2016). Chade: Metalearning with classifier chains for dynamic combination of classifiers. In Joint european conference on machine learning and knowledge discovery in databases. Springer. |

[46] | R Core Team. (2013). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. |

[47] | Ridgeway, G. (2015) gbm: Generalized Boosted Regression Models. R package version 2.1.1. |

[48] | Rossi, ALD; Leon Ferreira, ACP; Soares, C.; Souza, BF; etal., Metastream: A meta-learning based method for periodic algorithm selection in time-changing data, Neurocomputing, 127, 52-64, (2014) |

[49] | Sánchez, I., Adaptive combination of forecasts with application to wind energy, International Journal of Forecasting, 24, 679-693, (2008) |

[50] | Takens, F. (1981). Dynamical Systems and Turbulence, Warwick 1980: Proceedings of a Symposium Held at the University of Warwick 1979/80, chap. Detecting strange attractors in turbulence (pp. 366-381). Berlin: Springer. https://doi.org/10.1007/BFb0091924. |

[51] | Timmermann, A., Forecast combinations, Handbook of Economic Forecasting, 1, 135-196, (2006) |

[52] | Timmermann, A., Elusive return predictability, International Journal of Forecasting, 24, 1-18, (2008) |

[53] | Todorovski, L.; Džeroski, S., Combining classifiers with meta decision trees, Machine Learning, 50, 223-249, (2003) · Zbl 1033.68099 |

[54] | van Rijn, J. N., Holmes, G., Pfahringer, B., & Vanschoren, J. (2018). The online performance estimation framework: Heterogeneous ensemble learning for data streams. Machine Learning, 107(1), 149-176. · Zbl 06855215 |

[55] | Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer. ISBN 0-387-95457-0. · Zbl 1006.62003 |

[56] | Wager, S.; Hastie, T.; Efron, B., Confidence intervals for random forests: The jackknife and the infinitesimal jackknife, The Journal of Machine Learning Research, 15, 1625-1651, (2014) · Zbl 1319.62132 |

[57] | Wang, X.; Smith-Miles, K.; Hyndman, R., Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series, Neurocomputing, 72, 2581-2594, (2009) |

[58] | Wolpert, DH, Stacked generalization, Neural networks, 5, 241-259, (1992) |

[59] | Wolpert, D. H. (2002). The supervised learning no-free-lunch theorems. In R. Roy, M. Köppen, S. Ovaska, T. Furuhashi, & F. Hoffmann (Eds.), Soft computing and industry (pp. 25-42). London: Springer. https://doi.org/10.1007/978-1-4471-0123-9_3. |

[60] | Wright, M. N. (2015). Ranger: A fast implementation of random forests. R package |

[61] | Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 928-936). |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.