zbMATH — the first resource for mathematics

Actuarial applications of word embedding models. (English) Zbl 1431.91337
Summary: In insurance analytics, textual descriptions of claims are often discarded, because traditional empirical analyses require numeric descriptor variables. This paper demonstrates how textual data can be easily used in insurance analytics. Using the concept of word similarities, we illustrate how to extract variables from text and incorporate them into claims analyses using standard generalized linear model or generalized additive regression model. This procedure is applied to the Wisconsin Local Government Property Insurance Fund (LGPIF) data, in order to demonstrate how insurance claims management and risk mitigation procedures can be improved. We illustrate two applications. First, we show how the claims classification problem can be solved using textual information. Second, we analyze the relationship between risk metrics and the probability of large losses. We obtain good results for both applications, where short textual descriptions of insurance claims are used for the extraction of features.
91G05 Actuarial mathematics
Full Text: DOI
[1] Chollet, F. and Allaire, J. J. (2018) Deep Learning with R. Shelter Island, NY: Manning Publications Co.
[2] Frees, E. W. (2009) Regression Modeling with Actuarial and Financial Applications. Cambridge, UK: Cambridge University Press. · Zbl 1284.62010
[3] Frees, E. W., Lee, G. Y. and Yang, L. (2016) Multivariate frequency-severity regression models in insurance. Risks, 2016(4): 4.
[4] Goldberg, Y. (2017) Neural Network Methods for Natural Language Processing. San Rafael, CA: Morgan & Claypool Publishers.
[5] Goodfellow, I., Bengio, Y. and Courville, A. (2016) Deep Learning. Cambridge, MA: MIT Press. · Zbl 1373.68009
[6] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Berlin: Springer Science & Business Media. · Zbl 1273.62005
[7] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Boca Raton, FL: Chapman and Hall. · Zbl 0747.62061
[8] Kearney, S. (2010). Insurance Operations. Malvern, PA: The Institutes.
[9] Manning, C. D. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing, 1st Edition. Cambridge, MA: The MIT Press. · Zbl 0951.68158
[10] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and Dean, J. (2013) Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems26: 3111-3119.
[11] Pennington, J., Socher, R. and Manning, C. D. (2014). Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), vol. 2014, pp. 1532-1543.
[12] Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45:427-437.
[13] Wood, S. (2013). On p values for smooth components of an extended generalized additive model. Biometrika100, 221-228. · Zbl 1284.62270
[14] Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition. Boca Raton, FL: CRC Press. · Zbl 1368.62004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.