×

Addressing imbalanced insurance data through zero-inflated Poisson regression with boosting. (English) Zbl 1471.91466

Summary: A machine learning approach to zero-inflated Poisson (ZIP) regression is introduced to address common difficulty arising from imbalanced financial data. The suggested ZIP can be interpreted as an adaptive weight adjustment procedure that removes the need for post-modeling re-calibration and results in a substantial enhancement of predictive accuracy. Notwithstanding the increased complexity due to the expanded parameter set, we utilize a cyclic coordinate descent optimization to implement the ZIP regression, with adjustments made to address saddle points. We also study how various approaches alleviate the potential drawbacks of incomplete exposures in insurance applications. The procedure is tested on real-life data. We demonstrate a significant improvement in performance relative to other popular alternatives, which justifies our modeling techniques.

MSC:

91G05 Actuarial mathematics
62P05 Applications of statistics to actuarial sciences and financial mathematics
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Boucher, J.-P., Denuit, M. and Guillén, M. (2007) Risk classification for claim counts: A comparative analysis of various zeroinflated mixed poisson and hurdle models. North American Actuarial Journal, 11(4), 110-131. · Zbl 1480.91187
[2] Boucher, J.-P., Denuit, M. and Guillen, M. (2009) Number of accidents or number of claims? An approach with zero-inflated poisson models for panel data. Journal of Risk and Insurance, 76(4), 821-846.
[3] Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A. (1984) Classification and Regression Trees. Boca Raton, Florida, USA: CRC Press. · Zbl 0541.62042
[4] Bühlmann, H. and Gisler, A. (2006) A Course in Credibility Theory and Its Applications. Berlin, Germany: Springer Science & Business Media. · Zbl 1108.91001
[5] Caldern-Ojeda, E., Gómez-Déniz, E. and Barranco-Chamorro, I. (2019). Modelling zero-inflated count data with a special case of the generalised poisson distribution. ASTIN Bulletin: The Journal of the IAA, 49(3), 689-707. · Zbl 1427.91220
[6] Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) Smote: A ynthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. · Zbl 0994.68128
[7] Chen, T., He, T., Benesty, M., Khotilovich, V. and Tang, Y. (2015) Xgboost: Extreme gradient boosting. R package version 0.4-2, 1-4.
[8] De Jong, P. and Heller, G.Z. (2008) Generalized Linear Models for Insurance Data. Cambridge, UK: Cambridge University Press. · Zbl 1142.91046
[9] Fernández, A., Garca, S., Galar, M., Prati, R.C., Krawczyk, B. and Herrera, F. (2018) Learning from Imbalanced Data Sets. Springer.
[10] Freund, Y. and Schapire, R.E. (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory, pp. 23-37. Springer.
[11] Friedman, J.H. (2001) Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. · Zbl 1043.62034
[12] Friedman, J.H. (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. · Zbl 1072.65502
[13] Gee, J. and Button, M. (2019) The financial cost of fraud 2019: The latest data from around the world. Tech. rep., Crowe UK.
[14] Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H. and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220-239.
[15] He, H. and Ma, Y. (2013). Imbalanced learning: Foundations, Algorithms, and Applications. New York, USA: John Wiley & Sons. · Zbl 1272.68022
[16] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.-Y. (2017) Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pp. 3146-3154.
[17] Kingman, J.F.C. (2005) Poisson processes. Encyclopedia of biostatistics 6.
[18] Klein, N., Kneib, T. and Lang, S. (2015) Bayesian generalized additive models for location, scale, and shape for zero-inflated and overdispersed count data. Journal of the American Statistical Association, 110(509), 405-419. · Zbl 1373.62103
[19] Lambert, D.1992. Zero-Inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14. · Zbl 0850.62756
[20] Lee, S.C. (2020) Delta boosting implementation of negative binomial regression in actuarial pricing. Risks, 8(1), 19.
[21] Lee, S.C. and Lin, S. (2018) Delta boosting machine with application to general insurance. North American Actuarial Journal, 22(3), 405-425. · Zbl 1416.91199
[22] Saha, A. and Tewari, A. (2010) On the finite time convergence of cyclic coordinate descent methods. arXiv preprint arXiv: 1005.2146. · Zbl 1270.90032
[23] Schapire, R.E. (1990) The strength of weak learnability. Machine Learning, 5(2), 197-227.
[24] Teugels, J.L. and Vynckie, P. (1996). The structure distribution in a mixed poisson process. International Journal of Stochastic Analysis, 9(4), 489-496. · Zbl 0874.60036
[25] Wright, S.J. (2015) Coordinate descent algorithms. Mathematical Programming, 151(1), 3-34. · Zbl 1317.49038
[26] Wuthrich, M.V. and Buser, C. (2019). Data analytics for non-life insurance pricing. Swiss Finance Institute Research Paper2019 (16-68).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.