##
**Delta boosting machine with application to general insurance.**
*(English)*
Zbl 1416.91199

Summary: In this article, we introduce delta boosting (DB) as a new member of the boosting family. Similar to the popular gradient boosting (GB), this new member is presented as a forward stagewise additive model that attempts to reduce the loss at each iteration by sequentially fitting a simple base learner to complement the running predictions. Instead of relying on the negative gradient, as is the case for GB, DB adopts a new measure called delta as the basis. Delta is defined as the loss minimizer at an observation level. We also show that DB is the optimal boosting member for a wide range of loss functions. The optimality is a consequence of DB solving for the split and adjustment simultaneously to maximize loss reduction at each iteration. In addition, we introduce an asymptotic version of DB that works well for all twice-differentiable strictly convex loss functions. This asymptotic behavior does not depend on the number of observations, but rather on a high number of iterations that can be augmented through common regularization techniques. We show that the basis in the asymptotic extension differs from the basis in GB only by a multiple of the second derivative of the log-likelihood. The multiple is considered to be a correction factor, one that corrects the bias toward the observations with high second derivatives in GB. When negative log-likelihood is used as the loss function, this correction can be interpreted as a credibility adjustment for the process variance. Simulation studies and real data application we conducted suggest that DB is a significant improvement over GB. The performance of the asymptotic version is less dramatic, but the improvement is still compelling. Like GB, DB provides a high transparency to users, and we can review the marginal influence of variables through relative importance charts and the partial dependence plots. We can also assess the overall model performance through evaluating the losses, lifts, and double lifts on the holdout sample.

### MSC:

91B30 | Risk theory, insurance (MSC2010) |

91-04 | Software, source code, etc. for problems pertaining to game theory, economics, and finance |

62P05 | Applications of statistics to actuarial sciences and financial mathematics |

PDF
BibTeX
XML
Cite

\textit{S. C. K. Lee} and \textit{S. Lin}, N. Am. Actuar. J. 22, No. 3, 405--425 (2018; Zbl 1416.91199)

Full Text:
DOI

### References:

[1] | Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140, (1996) · Zbl 0858.68080 |

[2] | Breiman, L., Arcing classifier (with discussion and a rejoinder by the author), Annals of Statistics, 26, 3, 801-849, (1998) · Zbl 0934.62064 |

[3] | Breiman, L., Using adaptive bagging to debias regressions, (1999) · Zbl 1052.68109 |

[4] | Breiman, L., Random forests, Machine Learning, 45, 1, 5-32, (2001) · Zbl 1007.68152 |

[5] | Breiman, L., Statistical modeling: the two cultures (with comments and a rejoinder by the author), Statistical Science, 16, 3, 199-231, (2001) · Zbl 1059.62505 |

[6] | Breiman, L.; Friedman, J.; Stone, C. J.; Olshen, R. A., Classification and Regression Trees, (1984), CRC Press, Boca Raton, FL · Zbl 0541.62042 |

[7] | Bühlmann, P.; Hothorn, T., Boosting algorithms: regularization, prediction and model Fitting, Statistical Science, 22, 477-505, (2007) · Zbl 1246.62163 |

[8] | Chen, T.; He, T.; Benesty, M., Xgboost documentation, (2016) |

[9] | Domingo, C.; Watanabe, O., Madaboost: A modification of adaboost, Proceedings of the 13th Annual Conference on Computational Learning Theory, 180-189, (2000) |

[10] | Freund, Y., Boosting a weak learning algorithm by majority, Information and Computation, 121, 2, 256-285, (1995) · Zbl 0833.68109 |

[11] | Freund, Y., An adaptive version of the boost by majority algorithm, Machine learning, 43, 3, 293-318, (2001) · Zbl 0988.68150 |

[12] | Freund, Y.; Schapire, R. E., Experiments with a new boosting algorithm, In Proceedings of the 13th International Conference on Machine Learning, 148-156, (1996) |

[13] | Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, 119-139, (1997) · Zbl 0880.68103 |

[14] | Friedman, J.; Hastie, T.; Tibshirani, R., Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors), Annals of Statistics, 28, 2, 337-407, (2000) · Zbl 1106.62323 |

[15] | Friedman, J.; Hastie, T.; Tibshirani, R., The Elements of Statistical Learning, 1, (2001), Springer, Berlin |

[16] | Friedman, J. H., Greedy function approximation: A gradient boosting machine, Annals of Statistics, 29, 1189-1232, (2001) · Zbl 1043.62034 |

[17] | Friedman, J. H., Stochastic gradient boosting, Computational Statistics & Data Analysis, 38, 4, 367-378, (2002) · Zbl 1072.65502 |

[18] | Guelman, L., Gradient boosting trees for auto insurance loss cost modeling and prediction, Expert Systems with Applications, 39, 3, 3659-3667, (2012) |

[19] | Huber, P. J., Robust estimation of a location parameter, Annals of Mathematical Statistics, 35, 1, 73-101, (1964) · Zbl 0136.39805 |

[20] | Ismail, N.; Jemain, A. A., Handling overdispersion with negative binomial and generalized Poisson regression models, Casualty Actuarial Society Forum, 103-158, (2007) |

[21] | Jorgensen, B., Exponential dispersion models, Journal of the Royal Statistical Society B, 49, 127-162, (1987) · Zbl 0662.62078 |

[22] | Lee, S.; Antonio, K., Why high dimensional modeling in actuarial science?, (2015) |

[23] | Lee, S. C.; Lin, X. S., Modeling and evaluating insurance losses via mixtures of Erlang distributions, North American Actuarial Journal, 14, 1, 107-130, (2010) |

[24] | Lee, S. C.; Lin, X. S., Modeling dependent risks with multivariate Erlang mixtures, ASTIN Bulletin, 42, 1, 153-180, (2012) · Zbl 1277.62255 |

[25] | Leskovec, J.; Shawe-Taylor, J., Linear programming boosting for uneven datasets, Proceedings of the 20th International Conference on Machine Learning, 456-463, (2003) |

[26] | Ridgeway, G., Generalized boosted models: A guide to the GBM package, Update, 1, 1, (2007) |

[27] | Schapire, R. E., The strength of weak learnability, Machine Learning, 5, 2, 197-227, (1990) |

[28] | Sun, Y.; Kamel, M. S.; Wong, A. K.; Wang, Y., Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40, 12, 3358-3378, (2007) · Zbl 1122.68505 |

[29] | Tweedie, M., An index which distinguishes between some important exponential families, Statistics: Applications and New Directions: Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, 579-604, (1984) |

[30] | Warmuth, M. K.; Liao, J.; Rätsch, G., Totally corrective boosting algorithms that maximize the margin, Proceedings of the 23rd International Conference on Machine Learning, 1001-1008, (2006) |

[31] | Watanabe, O., Progress in Discovery Science, Algorithmic aspects of boosting, 349-359, (2002), Springer, Berlin · Zbl 1052.68705 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.