×

zbMATH — the first resource for mathematics

Imbalanced regression and extreme value prediction. (English) Zbl 07289239
Summary: Research in imbalanced domain learning has almost exclusively focused on solving classification tasks for accurate prediction of cases labelled with a rare class. Approaches for addressing such problems in regression tasks are still scarce due to two main factors. First, standard regression tasks assume each domain value as equally important. Second, standard evaluation metrics focus on assessing the performance of models on the most common values of data distributions. In this paper, we present an approach to tackle imbalanced regression tasks where the objective is to predict extreme (rare) values. We propose an approach to formalise such tasks and to optimise/evaluate predictive models, overcoming the factors mentioned and issues in related work. We present an automatic and non-parametric method to obtain relevance functions, building on the concept of relevance as the mapping of target values into non-uniform domain preferences. Then, we propose SERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias. An experimental study demonstrates how SERA provides valid and useful insights into the performance of models in imbalanced regression tasks.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aggarwal, CC, Outlier analysis (2013), Berlin: Springer, Berlin
[2] Akbilgic, O.; Bozdogan, H.; Balaban, ME, A novel hybrid RBF neural networks model as a forecaster, Statistics and Computing, 24, 3, 365-375 (2014) · Zbl 1325.62193
[3] Aldrin, M., Improved predictions penalizing both slope and curvature in additive models, Computational Statistics and Data Analysis, 50, 2, 267-284 (2006) · Zbl 1431.62134
[4] Aldrin, M.; Haff, IH, Generalised additive modelling of air pollution, traffic volume and meteorology, Atmospheric Environment, 39, 11, 2145-2155 (2005)
[5] Barker, PM; McDougall, TJ, Two interpolation methods using multiply-rotated piecewise cubic hermite interpolating polynomials, Journal of Atmospheric and Oceanic Technology, 37, 4, 605-619 (2020)
[6] Basu, K.; Mariani, M.; Serpa, L.; Sinha, R., Evaluation of interpolants in their ability to fit seismometric time series, Mathematics, 3, 3, 666-689 (2015) · Zbl 1331.86014
[7] Benavoli, A., Mangili, F., Corani, G., Zaffalon, M., & Ruggeri, F. (2014). A Bayesian Wilcoxon signed-rank test based on the Dirichlet process. In Proceedings of the 31st international conference on international conference on machine learning, ICML’14 (Vol. 32, pp. II-1026-II-1034), JMLR.org.
[8] Benavoli, A.; Corani, G.; Demšar, J.; Zaffalon, M., Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis, The Journal of Machine Learning Research, 18, 1, 2653-2688 (2017)
[9] Bi, J., & Bennett, K. P. (2003). Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (pp. 43-50).D
[10] Branco, P.; Torgo, L.; Ribeiro, RP, A survey of predictive modeling on imbalanced domains, ACM Computing Surveys, 49, 2, 31:1-31:50 (2016)
[11] Branco, P.; Torgo, L.; Ribeiro, RP, Pre-processing approaches for imbalanced distributions in regression, Neurocomputing, 343, 76-99 (2019)
[12] Brazdil, P.; Giraud-Carrier, C.; Soares, C.; Vilalta, R., Metalearning: Applications to data mining (2008), Berlin: Springer, Berlin · Zbl 1173.68625
[13] Brys, G.; Hubert, M.; Struyf, A., A robust measure of skewness, Journal of Computational and Graphical Statistics, 13, 4, 996-1017 (2004)
[14] Cain, M.; Janssen, C., Real estate price prediction under asymmetric loss, Annals of the Institute of Statistical Mathematics, 47, 3, 401-414 (1995)
[15] Chandola, V.; Banerjee, A.; Kumar, V., Anomaly detection: A survey, ACM Computing Surveys, 41, 3, 1541882 (2009)
[16] Christoffersen, PF; Diebold, FX, Further results on forecasting and model selection under asymmetric loss, Jour of Applied Econometrics, 11, 5, 561-571 (1996)
[17] Christoffersen, PF; Diebold, FX, Optimal prediction under asymmetric loss, Econometric Theory, 13, 6, 808-817 (1997)
[18] Cleveland, WS; Grosse, E.; Shyu, WM, Local regression models (1992), Belmont: Wadsworth & Brooks/Cole, Belmont
[19] Crone, S. F., Lessmann, S., & Stahlbock, R. (2005). Utility based data mining for time series analysis: Cost-sensitive learning for neural network predictors. In Proceedings of the 1st international workshop on utility-based data mining. (pp. 59-68). ACM.
[20] Ding, D., Zhang, M., Pan, X., Yang, M., & He, X. (2019). Modeling extreme events in time series prediction. In Proceedings of the 25th ACM SIGKDD (pp. 1114-1122). ACM.
[21] Dougherty, RL; Edelman, A.; Hyman, JM, Nonnegativity-, monotonicity-, or convexity-preserving cubic and quintic Hermite interpolation, Mathematics of Computation, 52, 186, 471-494 (1989) · Zbl 0693.41004
[22] Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. In Proceedings of the 9th international conference on neural information processing systems, NIPS’96 (pp. 155-161) MIT Press.
[23] Fawcett, T., & Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the Fifth ACM SIGKDD, KDD’99 (pp. 53-62). ACM.
[24] Fernández, A.; García, S.; Galar, M.; Prati, RC; Krawczyk, B.; Herrera, F., Learning from imbalanced data sets (2018), Berlin: Springer, Berlin
[25] Freemeteo. (2017). http://freemeteo.com.pt/. Accessed March 30, 2017.
[26] Fritsch, FN; Carlson, RE, Monotone piecewise cubic interpolation, SIAM Journal on Numerical Analysis, 17, 238-246 (1980) · Zbl 0423.65011
[27] Geman, S.; Bienenstock, E.; Doursat, R., Neural networks and the bias/variance dilemma, Neural Computation, 4, 1, 1-58 (1992)
[28] Giraud-Carrier, C. (2005). The data mining advisor: Meta-learning at the service of practitioners. In Proceedings of the fourth international conference on machine learning and applications, ICMLA’05 (pp. 113-119). IEEE Computer Society, USA. 10.1109/ICMLA.2005.65.
[29] Goodwin, P.; Wright, G., The limits of forecasting methods in anticipating rare events, Technological Forecasting and Social Change, 77, 3, 355-368 (2010)
[30] Granger, CW, Outline of forecast theory using generalized cost functions, Spanish Economic Review, 1, 2, 161-173 (1999)
[31] Hald, AA, A history of mathematical statistics from 1750 to 1930 (1998), New York: Wiley, New York · Zbl 0979.01012
[32] He, X., Zhao, K., & Chu, X. (2019). Automl: A survey of the state-of-the-art. 1908.00709.
[33] He, H.; Ma, Y., Imbalanced learning: foundations, algorithms, and applications (2013), New York: Wiley-IEEE Press, New York · Zbl 1272.68022
[34] Hernández-Orallo, J., Roc curves for regression, Pattern Recognition, 46, 12, 3395-3411 (2013) · Zbl 1326.62138
[35] Herrera, M.; Torgo, L.; Izquierdo, J.; Pérez-García, R., Predictive models for forecasting hourly urban water demand, Journal of Hydrology, 387, 141-150 (2010)
[36] Hoaglin, DC; Mosteller, F.; Tukey, JW, Understanding robust and exploratory data analysis (1983), New York: Wiley, New York
[37] Hodge, V.; Austin, J., A survey of outlier detection methodologies, Artificial Intelligence Review, 22, 2, 85-126 (2004) · Zbl 1101.68023
[38] Hubert, M.; Vandervieren, E., An adjusted boxplot for skewed distributions, Computational Statistics and Data Analysis, 52, 12, 5186-5201 (2008) · Zbl 1452.62074
[39] Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th international conference on learning and intelligent optimization, LION’05 (pp. 507-523). Springer, Berlin, Heidelberg. 10.1007/978-3-642-25566-3_40.
[40] Koprinska, I., & Rana, M., & Agelidis, V. (2011). Yearly and seasonal models for electricity load forecasting. In Proceedings of IJCNN (pp. 1474-1481).
[41] Krawczyk, B., Learning from imbalanced data: Open challenges and future directions, Progress in Artificial Intelligence, 5, 4, 221-232 (2016)
[42] Kruschke, JK, Doing Bayesian data analysis (2015), Boston: Academic Press, Boston
[43] Kruschke, JK; Liddell, TM, The Bayesian new statistics: Two historical trends converge (2015), New York: SSRN eLibrary, New York
[44] Kruschke, JK; Liddell, TM, The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective, Psychonomic Bulletin & Review, 2017, 1-29 (2017)
[45] Lee, T. H. (2007). Loss functions in time series forecasting. Int Encyclopedia of the Social Sciences.
[46] Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A., Hyperband: A novel bandit-based approach to hyperparameter optimization, Journal of Machine Learning Research, 18, 1, 6765-6816 (2017)
[47] López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F., An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences, 250, 113-141 (2013)
[48] Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2019). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package v1.7-0.1.
[49] Milborrow, S. (2019). earth: Multivariate Adaptive Regression Splines. R package v4.7.0.
[50] Moniz, N., Ribeiro, R., Cerqueira, V., & Chawla, N. (2018). Smoteboost for regression: Improving the prediction of extreme values. In 2018 IEEE 5th international conference on data science and advanced analytics (DSAA) (pp 150-159).
[51] Moniz, N.; Branco, P.; Torgo, L., Resampling strategies for imbalanced time series forecasting, International Journal of Data Science and Analytics, 3, 3, 161-181 (2017)
[52] Moniz, N.; Torgo, L.; Eirinaki, M.; Branco, P., A framework for recommendation of highly popular news lacking social feedback, New Generation Computing, 35, 4, 417-450 (2017)
[53] Organization, W. H. (2005). Who air quality guidelines for particulate matter, ozone, nitrogen dioxide and sulfur dioxide.
[54] Pebesma, E., spacetime: Spatio-temporal data in r, Journal of Statistical Software, Articles, 51, 7, 1-30 (2012)
[55] Peters, A., & Hothorn, T. (2018). ipred: Improved Predictors. R package v0.9-8.
[56] Phillips, G. (2003). Interpolation and approximation by polynomials. CMS Books in Mathematics. Springer, https://books.google.pt/books?id=87vciTxMcF8C. · Zbl 1023.41002
[57] Pinto, F., Cerqueira, V., Soares, C., & Mendes-Moreira, J. (2017). autobagging: Learning to rank bagging workflows with metalearning. 1706.09367.
[58] R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
[59] Ribeiro, R. P. (2011). Utility-based regression. PhD thesis, Dep. Computer Science, Faculty of Sciences, University of Porto.
[60] Rijsbergen, CJV, Information retrieval (1979), Oxford: Butterworth-Heinemann, Oxford
[61] Royston, P.; Altman, DG; Sauerbrei, W., Dichotomizing continuous predictors in multiple regression: A bad idea, Statistics in Medicine, 25, 1, 127-141 (2006)
[62] Siffer, A., Fouque, P. A., Termier, A., & Largouet, C. (2017). Anomaly detection in streams with extreme value theory. In Proceedings of the 23rd ACM SIGKDD, KDD’17 (pp. 1067-1075). ACM.
[63] Therneau, T., & Atkinson, B. (2018). rpart: Recursive Partitioning and Regression Trees. R package v4.1-12.
[64] Torgo, L. (2005). Regression error characteristic surfaces. In Proceedings of the eleventh ACM SIGKDD, KDD’05 (pp. 697-702). ACM.
[65] Torgo, L., & Ribeiro, R. (2007). Utility-based regression. In Proceedngs of 11th European conference on principles and practice of knowledge discovery in databases, PKDD (pp. 597-604). Springer Berlin Heidelberg.
[66] Torgo, L.; Branco, P.; Ribeiro, RP; Pfahringer, B., Resampling strategies for regression, Expert Systems, 32, 3, 465-476 (2013)
[67] Tukey, JW, Exploratory data analysis (1970), Reading: Addison-Wesley, Reading
[68] Wang, X., Varol, O., & Eliassi-Rad, T. (2019). L2P: an algorithm for estimating heavy-tailed outcomes. CoRR abs/1908.04628.
[69] Wickham, H., & Stryjewski, L. (2012). 40 years of boxplots. Tech. rep., had.co.nz.
[70] Wilcox, RR, Comparing the means of two independent groups, Biometrical Journal, 32, 7, 771-780 (1990)
[71] Wilcox, R., Introduction to robust estimation and hypothesis testing. Statistical modeling and decision science (2005), Amsterdam: Elsevier Science, Amsterdam · Zbl 1113.62036
[72] Wright, MN; Ziegler, A., ranger: A fast implementation of random forests for high dimensional data in C++ and R, Journal of Statistical Software, 77, 1, 1-17 (2017)
[73] Zellner, A., Bayesian estimation and prediction using asymmetric loss functions, Journal of the American Statistical Association, 81, 394, 446-451 (1986) · Zbl 0603.62037
[74] Zheng, Y., Liu, F., & Hsieh, H.P. (2013). U-Air: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD (pp. 1436-1444). ACM.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.