×

Demand forecasting of individual probability density functions with machine learning. (English) Zbl 1473.62319

Summary: Demand forecasting is a central component of the replenishment process for retailers, as it provides crucial input for subsequent decision making like ordering processes. In contrast to point estimates, such as the conditional mean of the underlying probability distribution, or confidence intervals, forecasting complete probability density functions allows to investigate the impact on operational metrics, which are important to define the business strategy, over the full range of the expected demand. Whereas metrics evaluating point estimates are widely used, methods for assessing the accuracy of predicted distributions are rare, and this work proposes new techniques for both qualitative and quantitative evaluation methods. Using the supervised machine learning method “Cyclic Boosting”, complete individual probability density functions can be predicted such that each prediction is fully explainable. This is of particular importance for practitioners, as it allows to avoid “black-box” models and understand the contributing factors for each individual prediction. Another crucial aspect in terms of both explainability and generalizability of demand forecasting methods is the limitation of the influence of temporal confounding, which is prevalent in most state-of-the-art approaches.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M20 Inference from stochastic processes and prediction
62P20 Applications of statistics to economics
90B05 Inventory, storage, reservoirs
60E05 Probability distributions: general theory
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Edgeworth F (1888) The mathematical theory of banking. J R Stat Soc
[2] Khouj, M (1999) The single-period (news-vendor) problem: literature review and suggestions for future research. Omega 27(5):537-553. doi:10.1016/S0305-0483(99)00017-1http://www.sciencedirect.com/science/article/pii/S0305048399000171 (last accessed: 2020-Dec-01)
[3] Wick F, Kerzel U, Feindt M (2019) Cyclic boosting - an explainable supervised machine learning algorithm. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, pp 358-363. doi:10.1109/icmla.2019.00067
[4] Statista: Profit margin of lidl sverige from 2013 to 2015. https://www.statistia.com/statistics/779146/profit-margin-of-lidl-sverige/ (last accessed: 2020-Dec-01)
[5] Statista: Operating margin of Hemköp in Sweden from 2012 to 2018. https://www.statistia.com/statistics/734370/operating-margin-of-hemkoep-in-sweden/ (last accessed: 2020-Dec-01)
[6] Statista: Operating margin of Willys in Sweden from 2012 to 2018. https://www.statistia.com/statistics/734261/operating-margin-of-willys-in-sweden/ (last accessed: 2020-Dec-01)
[7] Statista: Operating profit margin of publix super markets in the united states from 2017 to 2019. https://www.statistia.com/statistics/1167301/publix-operating-profit-us/ (last accessed: 2020-Dec-01)
[8] Beheshti-Kashi, S.; Karimi, HR; Thoben, KD; Lütjen, M.; Teucke, M., A survey on retail sales forecasting and prediction in fashion markets, Syst Sci Control Eng, 3, 1, 154-161 (2015) · doi:10.1080/21642583.2014.999389
[9] Box GEP, Jenkins GM, Reinsel GC, Ljungl GM (2015) Time Series Analysis: Forecasting and Control, 5 edn. Wiley
[10] Brown RG (1963) Smoothing Forecasting and Prediction of Discrete Time Series. Prentice-Hall International, Inc., London, UK
[11] Croston, JD, Forecasting and stock control for intermittent demands, J Oper Res Soc, 23, 3, 289-303 (1972) · Zbl 0238.90021 · doi:10.1057/jors.1972.50
[12] Gardner, ES, Exponential smoothing: the state of the art, J Forecast, 4, 1-28 (1985) · doi:10.1002/for.3980040103
[13] Holt CC (1957) Forecasting trends and seasonal by exponentially weighted moving averages. ONR Memorandum 52
[14] De Gooijer, JG; Hyndman, RJ, 25 years of time series forecasting, Int J Forecast, 22, 3, 443-473 (2006) · doi:10.1016/j.ijforecast.2006.01.001
[15] Fattah, J.; Ezzine, L.; Aman, Z.; El Moussami, H.; Lachhab, A., Forecasting of demand using arima model, Int J Eng Bus Manag, 10, 184797901880867 (2018) · doi:10.1177/1847979018808673
[16] Huber, J.; Gossmann, A.; Stuckenschmidt, H., Cluster-based hierarchical demand forecasting for perishable goods, Expert Syst Appl, 76, 140-151 (2017) · doi:10.1016/j.eswa.2017.01.022
[17] Kalchschmidt M, Verganti R, Zotteri G (2006) Forecasting demand from heterogeneous customers. Int J Oper Prod Manag
[18] Permatasari CI, Sutopo W, Hisjam M (2018) Sales forecasting newspaper with Arima: A case study. In: AIP Conference Proceedings, vol. 1931. AIP Publishing LLC, p 030017
[19] Kalman, RE, A new approach to linear filtering and prediction problems, J Basic Eng, 82, 1, 35-45 (1960) · doi:10.1115/1.3662552
[20] Morrison, GW; Pike, DH, Kalman filtering applied to statistical forecasting, Manag Sci, 23, 7, 768-774 (1977) · Zbl 0356.62071 · doi:10.1287/mnsc.23.7.768
[21] Jacobi M, Karimanzira D, Ament C (2007) Water demand forecasting using Kalman filtering. In: Proceedings of the 16th IASTED International Conference on Applied Simulation and Modelling, pp. 199-202
[22] Kandananond K (2014) Applying Kalman filter for correlated demand forecasting. In: Applied Mechanics and Materials, vol 619. Trans Tech Publ pp 381-384
[23] Mitropoulos, C.; Samouilidis, J.; Protonotarios, E., Using kalman filtering for energy forecasting, IFAC Proceedings Volumes, 13, 5, 317-324 (1980) · doi:10.1016/S1474-6670(17)64888-2
[24] Tegene, A., Kalman filter and the demand for cigarettes, Appl Econ, 23, 7, 1175-1182 (1991) · doi:10.1080/00036849100000155
[25] Hyndman R, Koehler AB, Ord JK, Snyder RD (2008) Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media · Zbl 1211.62165
[26] Ramos, P.; Santos, N.; Rebelo, R., Performance of state space and arima models for consumer retail sales forecasting, Robot Comput Integr Manuf, 34, 151-163 (2015) · doi:10.1016/j.rcim.2014.12.015
[27] Harvey, A.; Peters, S., Estimation procedures for structural time series models, J Forecast, 9, 89-108 (1990) · doi:10.1002/for.3980090203
[28] Taylor, SJ; Letham, B., Forecasting at scale, Am Stat, 72, 1, 37-45 (2018) · Zbl 07663916 · doi:10.1080/00031305.2017.1380080
[29] Kök, AG; Fisher, ML, Demand estimation and assortment optimization under substitution: Methodology and application, Oper Res, 55, 6, 1001-1021 (2007) · Zbl 1167.91386 · doi:10.1287/opre.1070.0409
[30] Wang, HJ; Chien, CF; Liu, CF, Demand forecasting using bayesian experiment with non-homogenous poisson process model, Int J Oper Res, 2, 1, 21-29 (2005) · Zbl 1109.62126
[31] Remus W, O’Connor M (2001) Neural networks for time-series forecasting. In: Principles of forecasting. Springer, pp 245-256
[32] Zhang G (2012) Neural Networks for time-series forecasting. Springer Berlin, Heidelberg, pp 461-477
[33] Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press · Zbl 0541.62042
[34] Ferreira, KJ; Lee, BHA; Simchi-Levi, D., Analytics for an online retailer: Demand forecasting and price optimization, Manuf Serv Oper Manag, 18, 1, 69-88 (2016) · doi:10.1287/msom.2015.0561
[35] Rumelhart, D.; Hinton, G.; Williams, R., Learning representations by back-propagating errors, Nature, 323, 533-536 (1986) · Zbl 1369.68284 · doi:10.1038/323533a0
[36] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Comput, 9, 8, 1735-1780 (1997) · doi:10.1162/neco.1997.9.8.1735
[37] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser LU, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in Neural Information Processing Systems 30. Curran Associates, Inc., pp 5998-6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf (last accessed: 2020-Dec-01)
[38] Bandara K, Shi P, Bergmeir C, Hewamalage H, Tran Q, Seaman B (2019) Sales demand forecast in e-commerce using a long short-term memory neural network methodology. In: International Conference on Neural Information Processing. Springer, pp 462-474
[39] Golkabek M, Senge R, Neumann R (2020) Demand forecasting using long short-term memory neural networks. arXiv preprint arXiv:2008.08522 (last accessed: 2020-Dec-01)
[40] Goyal A, Kumar R, Kulkarni S, Krishnamurthy S, Vartak M (2018) A solution to forecast demand using long short-term memory recurrent neural networks for time series forecasting. In: Midwest Decision Sciences Institute Conference
[41] Helmini S, Jihan N, Jayasinghe M, Perera S (2019) Sales forecasting using multivariate long short term memory network models. Peer J Pre Prints 7:e27712v1
[42] Yu Q, Wang K, Strandhagen JO, Wang Y (2017) Application of long short-term memory neural network to sales forecasting in retail-a case study. In: International Workshop of Advanced Manufacturing and Automation. Springer, pp 11-17
[43] Längkvist, M.; Karlsson, L.; Loutfi, A., A review of unsupervised feature learning and deep learning for time-series modeling, Pattern Recogn Lett, 42, 11-24 (2014) · doi:10.1016/j.patrec.2014.01.008
[44] Dixon MF (2020) Industrial forecasting with exponentially smoothed recurrent neural networks. arXiv preprint arXiv:2004.04717 (last accessed: 2020-Dec-01)
[45] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. pp 2672-2680
[46] Haas M, Richter S (2020) Statistical analysis of wasserstein gans with applications to time series forecasting. arXiv preprint arXiv:2011.03074 (last accessed: 2020-Dec-01)
[47] Ramponi G, Protopapas P, Brambilla M, Janssen R (2018) T-CGAN: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. arXiv preprint arXiv:1811.08295 (last accessed: 2020-Dec-01)
[48] Smith KE, Smith AO (2020) Conditional GAN for timeseries generation. arXiv preprint arXiv:2006.16477 (last accessed: 2020-Dec-01)
[49] Malinsky D, Spirtes P (2018) Causal structure learning from multivariate time series in settings with unmeasured confounding. In: Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery, pp 23-47
[50] Runge J (2018) Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos Int J Nonlinear Sci 28(7):075310 · Zbl 1396.62218
[51] Runge J, Nowack P, Kretschmer M, Flaxman S, Sejdinovic D (2019) Detecting and quantifying causal associations in large nonlinear time series datasets. Sci Adv 5(11):eaau4996. doi:10.1126/sciadv.aau4996
[52] Bica I, Alaa A, Van Der Schaar M (2020) Time series deconfounder: Estimating treatment effects over time in the presence of hidden confounders. In: International Conference on Machine Learning. PMLR, pp 884-895
[53] Perrakis, K.; Gryparis, A.; Schwartz, J.; Tertre, AL; Katsouyanni, K.; Forastiere, F.; Stafoggia, M.; Samoli, E., Controlling for seasonal patterns and time varying confounders in time-series epidemiological models: a simulation study, Stat Med, 33, 28, 4904-4918 (2014) · doi:10.1002/sim.6271
[54] Brodersen, KH; Gallusser, F.; Koehler, J.; Remy, N.; Scott, SL, Inferring causal impact using bayesian structural time-series models, Ann Appl Stat, 9, 247-274 (2015) · Zbl 1454.62473 · doi:10.1214/14-AOAS788
[55] Brodersen KH (2015) Hauser A Causal Impact: An R package for causal inference in time series. http://google.github.io/CausalImpact/ (last accessed 2020-Dec-19)
[56] Chatfield, C., Calculating interval forecasts, J Bus Econ Stat, 11, 2, 121-135 (1993)
[57] Angus, JE (1994) The probability integral transform and related results. SIAM Review 36(4):652-654. http://www.jstor.org/stable/2132726 (last accessed: 2020-Dec-01) · Zbl 0808.62013
[58] Clements, MP; Taylor, N., Bootstrapping prediction intervals for autoregressive models, Int J For Eng, 17, 2, 247-267 (2001)
[59] Grigoletto, M., Bootstrap prediction intervals for autoregressions: some alternatives, Int J Forecast, 14, 4, 447-456 (1998) · doi:10.1016/S0169-2070(98)00004-1
[60] Masarotto, G., Bootstrap prediction intervals for autoregressions, Int J Forecast, 6, 2, 229-239 (1990) · doi:10.1016/0169-2070(90)90008-Y
[61] McCullough, B., Bootstrapping forecast intervals: an application to ar(p) models, J Forecast, 13, 1, 51-66 (1994) · doi:10.1002/for.3980130107
[62] McCullough, B., Consistent forecast intervals when the forecast-period exogenous variables are stochastic, J Forecast, 15, 4, 293-304 (1996) · doi:10.1002/(SICI)1099-131X(199607)15:4<293::AID-FOR611>3.0.CO;2-6
[63] Pascual, L.; Romo, J.; Ruiz, E., Effects of parameter estimation on prediction densities: a bootstrap approach, Int J Forecast, 17, 1, 83-103 (2001) · doi:10.1016/S0169-2070(00)00069-8
[64] Pascual, L.; Romo, J.; Ruiz, E., Bootstrap predictive inference for arima processes, J Time Ser Anal, 25, 4, 449-465 (2004) · Zbl 1062.62199 · doi:10.1111/j.1467-9892.2004.01713.x
[65] Pascual, L.; Romo, J.; Ruiz, E., Bootstrap prediction intervals for power-transformed time series, Int J Forecast, 21, 2, 219-235 (2005) · doi:10.1016/j.ijforecast.2004.09.006
[66] Thombs, LA; Schucany, WR, Bootstrap prediction intervals for autoregression, J Am Stat Assoc, 85, 410, 486-492 (1990) · Zbl 0705.62089 · doi:10.1080/01621459.1990.10476225
[67] Koenker, R.; Hallock, KF, Quantile regression, J Econ Perspect, 15, 4, 143-156 (2001) · doi:10.1257/jep.15.4.143
[68] Feindt M, Kerzel U (2006) The neurobayes neural network package. NIM A 559(1):190-194. http://www.sciencedirect.com/science/article/pii/S0168900205022679 (last accessed: 2020-Dec-01)
[69] Hyndman, RJ, Highest-density forecast regions for nonlinear and non-normal time series models, J Forecast, 14, 5, 431-441 (1995) · doi:10.1002/for.3980140503
[70] Tay, AS; Wallis, KF, Density forecasting: a survey, J Forecast, 19, 4, 235-254 (2000) · doi:10.1002/1099-131X(200007)19:4<235::AID-FOR772>3.0.CO;2-L
[71] Wen R, Torkkola K, Narayanaswamy B, Madeka D (2017) A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053 (last accessed: 2020-Dec-01)
[72] Lim B, Arik SO, Loeff N, Pfister T (2019) Temporal fusion transformers for interpretable multi-horizon time series forecasting. arXiv preprint arXiv:1912.09363 (last accessed: 2020-Dec-01)
[73] Rasul K, Sheikh AS, Schuster I, Bergmann U, Vollgraf R (2020) Multi-variate probabilistic time series forecasting via conditioned normalizing flows. arXiv preprint arXiv:2002.06103 (last accessed: 2020-Dec-01)
[74] Rezende DJ, Mohamed S (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770 (last accessed: 2020-Dec-01)
[75] Bishop CM (1994) Mixture density networks. http://publications.aston.ac.uk/id/eprint/373/ (last accessed: 2020-Dec-01)
[76] Salinas D, Flunkert V, Gasthaus J, Januschowski T (2020) Deepar: Probabilistic forecasting with autoregressive recurrent networks. Int J Forecast 36(3):1181-1191. doi:10.1016/j.ijforecast.2019.07.001. http://www.sciencedirect.com/science/article/pii/S0169207019301888 (last accessed: 2020-Dec-01)
[77] Adan, I.; van Eenige, M.; Resing, J., Fitting discrete distributions on the first two moments, Probab Eng Inf Sci, 9, 4, 623-632 (1995) · Zbl 1336.60035 · doi:10.1017/S0269964800004101
[78] Chatfield C, Goodhardt GJ (1973) A consumer purchasing model with erlang inter-purchase time. J Am Stat Assoc 68(344):828-835. http://www.jstor.org/stable/2284508 (last accessed: 2020-Dec-01)
[79] Ehrenberg A (1972) Repeat-buying; theory and applications. North-Holland Pub. Co.
[80] Ehrenberg ASC (1959) The pattern of consumer purchases. J R Stat Soc Series C(1):26-41. http://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=edsrep&AN=edsrep.a.bla.jorssc.v8y1959i1p26.41&site=eds-live&scope=site (last accessed: 2020-Dec-01)
[81] Goodhardt, GJ; Ehrenberg, A., Conditional trend analysis: A breakdown by initial purchasing level, J Mark Res, 4, 155-161 (1967) · doi:10.1177/002224376700400206
[82] Schmittlein DC, Bemmaor AC, Morrison DG (1985) Technical note - why does the NBD model work? Robustness in representing product purchases, brand purchases and imperfectly recorded purchases. Mark Sci 4(3):255-266. doi:10.1287/mksc.4.3.255
[83] Ban, GY; Rudin, C., The big data newsvendor: Practical insights from machine learning, Oper Res, 67, 1, 90-108 (2019) · Zbl 1443.90093 · doi:10.1287/opre.2018.1757
[84] Bertsimas, D.; Kallus, N., From predictive to prescriptive analytics, Manag Sci, 66, 3, 1025-1044 (2020) · doi:10.1287/mnsc.2018.3253
[85] Beutel, AL; Minner, S., Safety stock planning under causal demand forecasting, Int J Prod Econ, 140, 2, 637-645 (2012) · doi:10.1016/j.ijpe.2011.04.017
[86] Huber, J.; Müller, S.; Fleischmann, M.; Stuckenschmidt, H., A data-driven newsvendor problem: From data to decision, Eur J Oper Res, 278, 3, 904-915 (2019) · Zbl 1430.90021 · doi:10.1016/j.ejor.2019.04.043
[87] Oroojlooyjadid, A.; Snyder, LV; Takáč, M., Applying deep learning to the newsvendor problem, IISE Transactions, 52, 4, 444-463 (2020) · doi:10.1080/24725854.2019.1632502
[88] Pearl J (2009) Causality: Models, Reasoning and Inference, 2 edn. Cambridge University Press · Zbl 1188.68291
[89] Rubin, DB, Estimating causal effects of treatments in randomized and nonrandomized studies, J Educ Psychol, 66, 5, 688-701 (1974) · doi:10.1037/h0037350
[90] Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424. doi:10.2307/1912791 · Zbl 1366.91115
[91] Hilbe J (2011) Negative binomial regression. Cambridge University Press, Cambridge, UK New York · Zbl 1269.62063
[92] Casella G (2002) Statistical inference. Duxbury/Thomson Learning, Pacific Grove, Calif
[93] Diebold FX, Gunther TA, Tay AS (1998) Evaluating density forecasts with applications to financial risk management. In: Symposium on Forecasting and Empirical Methods in Macroeconomics and Finance, vol. 39. pp. 863-883
[94] Olkin, I.; Pukelsheim, F., The distance between two random vectors with given dispersion matrices, Linear Algebra Appl, 48, 257-263 (1982) · Zbl 0527.60015 · doi:10.1016/0024-3795(82)90112-4
[95] Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Statist 22(1):79-86. doi:10.1214/aoms/1177729694 · Zbl 0042.38403
[96] Dagan I, Lee L, Pereira F (1997) Similarity-based methods for word sense disambiguation. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, ACL ’98/EACL ’98. Association for Computational Linguistics, USA, pp 56-63. doi:10.3115/976909.979625
[97] Székely GJ, Rizzo ML (2013) Energy statistics: A class of statistics based on distances. J Stat Plan Inference 143(8):1249-1272. doi:10.1016/j.jspi.2013.03.018. http://www.sciencedirect.com/science/article/pii/S0378375813000633 (last accessed: 2020-Dec-01) · Zbl 1278.62072
[98] https://www.kaggle.com/c/m5-forecasting-accuracy/data (last accessed: 2020-Dec-01)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.