Data clustering with actuarial applications. (English) Zbl 1454.91186

Summary: Data clustering refers to the process of dividing a set of objects into homogeneous groups or clusters such that the objects in each cluster are more similar to each other than to those of other clusters. As one of the most popular tools for exploratory data analysis, data clustering has been applied in many scientific areas. In this article, we give a review of the basics of data clustering, such as distance measures and cluster validity, and different types of clustering algorithms. We also demonstrate the applications of data clustering in insurance by using two scalable clustering algorithms, the truncated fuzzy \(c\)-means (TFCM) algorithm and the hierarchical \(k\)-means algorithm, to select representative variable annuity contracts, which are used to build predictive models. We found that the hierarchical \(k\)-means algorithm is efficient and produces high-quality representative variable annuity contracts.


91G05 Actuarial mathematics
62P05 Applications of statistics to actuarial sciences and financial mathematics
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI


[1] Aggarwal, C. C., Data mining: The textbook (2015), New York: Springer, New York · Zbl 1311.68001
[2] Aggarwal, C. C.; Reddy, C. K., Data clustering: Algorithms and applications (2013), Boca Raton, FL: CRC Press, Boca Raton, FL
[3] Anderberg, M., Cluster analysis for applications (1973), New York: Academic Press, New York · Zbl 0299.62029
[4] Berry, M.; Linoff, G., Mastering data mining (2000), New York: John Wiley & Sons, New York
[5] Bezdek, J. C.; Ehrlich, R.; Full., W., FCM: The fuzzy c-means clustering algorithm, Computers & Geosciences, 10, 2-3, 191-203 (1984)
[6] Bobrowski, L.; Bezdek., J., c-means clustering with the l_1 and \(####\) norms, IEEE Transactions on Systems, Man and Cybernetics, 21, 3, 545-554 (1991) · Zbl 0735.62059
[7] Bock, H., Conceptual and numerical analysis of data, Probabilistic aspects in cluster analysis, 12-44 (1989), Augsburg: Springer-Verlag, Augsburg
[8] Bramer, M., Principles of Data Mining (2013), New York: Springer, New York · Zbl 1262.68001
[9] Campbell, M., An integrated system for estimating the risk premium of individual car models in motor insurance, ASTIN Bulletin, 16, 2, 165-183 (1986)
[10] Carmichael, J.; George, J.; Julius., R., Finding natural clusters, Systematic Zoology, 17, 2, 144-150 (1968)
[11] Cox, D. R., Note on grouping, Journal of the American Statistical Association, 52, 280, 543-547 (1957) · Zbl 0088.35402
[12] Dardis, T., Model efficiency in the U.S. life insurance industry, The Modeling Platform (3, 9-16 (2016)
[13] Driver, H. E.; Kroeber, A. L., Quantitative expression of cultural relationships, University of California Publications in American Archaeology and Ethnology, 31, 4, 211-256 (1932)
[14] Dunn, J. C., A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics, 3, 3, 32-57 (1973) · Zbl 0291.68033
[15] Everitt, B. S.; Landau, S.; Leese, M.; Stahl, D., Cluster analysis (2011), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 1274.62003
[16] Fisher, W. D., On grouping for maximum homogeneity, Journal of the American Statistical Association, 53, 284, 789-798 (1958) · Zbl 0084.35904
[17] Florek, K.; Lukaszewicz, J.; Steinhaus, H.; Zubrzycki., S., Sur la liaison et la division des points d’un ensemble fini, Colloquium Mathematicum, 2, 282-285 (1951) · Zbl 0045.26103
[18] Frees, E. W., Regression modeling with actuarial and financial applications (2009), Cambridge: Cambridge University Press, Cambridge · Zbl 1284.62010
[19] Gan, G., 3162-3163 (2015)
[20] Gan, G., Valuation of large variable annuity portfolios using linear models with interactions, Risks, 6, 3, 71 (2018)
[21] Gan, G.; Huang, J., 1467-1475 (2017)
[22] Gan, G.; Lan, Q.; Ma., C., Scalable clustering by truncated fuzzy c-means, Big Data and Information Analytics, 1, 2-3, 247-259 (2016)
[23] Gan, G.; Lin., X. S., Efficient Greek calculation of variable annuity portfolios for dynamic hedging: A two-level metamodeling approach, North American Actuarial Journal, 21, 2, 161-177 (2017) · Zbl 1414.91188
[24] Gan, G.; Ma, C.; Wu, J., Data clustering: Theory, algorithms, and applications (2007), Philadelphia: SIAM Press, Philadelphia
[25] Gan, G.; Valdez, E. A., An empirical comparison of some experimental designs for the valuation of large variable annuity portfolios, Dependence Modeling, 4, 1, 382-400 (2016) · Zbl 1382.91046
[26] Gan, G.; Valdez., E. A., Modeling partial greeks of variable annuities with dependence, Insurance: Mathematics and, 76, 118-134 (2017) · Zbl 1395.91251
[27] Gan, G.; Valdez., E. A., Valuation of large variable annuity portfolios: Monte carlo simulation and synthetic datasets, Dependence Modeling, 5, 354-374 (2017) · Zbl 1390.91320
[28] Gan, G.; Valdez., E. A., Regression modeling for the valuation of large variable annuity portfolios, North American Actuarial Journal, 22, 1, 40-54 (2018) · Zbl 1393.91099
[29] Gower, J., A comparison of some methods of cluster analysis, Biometrics, 23, 4, 623-637 (1967)
[30] Gower, J., A general coefficient of similarity and some of its properties, Biometrics, 27, 4, 857-874 (1971)
[31] Halkidi, M.; Batistakis, Y.; Vazirgiannis., M., Cluster validity methods: Part, I. ACM SIGMOD Record, 31, 2, 40-45 (2002)
[32] Halkidi, M.; Batistakis, Y.; Vazirgiannis., M., Clustering validity checking methods: Part II, ACM SIGMOD Record, 31, 3, 19-27 (2002)
[33] Hardy, M., Investment guarantees: Modeling and risk management for equity-linked life insurance (2003), Hoboken, NJ: John Wiley and Sons, Hoboken, NJ · Zbl 1092.91042
[34] Hartigan, J. A., Clustering algorithms. Probability & Mathematical Statistics (1975), New York: Wiley, New York · Zbl 0372.62040
[35] Hejazi, S. A.; Jackson., K. R., A neural network approach to efficient valuation of large portfolios of variable annuities, Insurance: Mathematics and Economics, 70, 169-181 (2016) · Zbl 1371.91092
[36] Hejazi, S. A.; Jackson, K. R.; Gan., G., A spatial interpolation framework for efficient valuation of large portfolios of variable annuities, Quantitative Finance and Economics, 1, 2, 125 (2017)
[37] Hubert, L.; Arabie., P., Comparing partitions, Journal of Classification, 2, 193-218 (1985)
[38] What data science means for the future of the actuarial profession: Abstract of the london discussion, British Actuarial Journal, 23, e16 (2018)
[39] Jain, A.; Murty, M.; Flynn., P., Data clustering: A review, ACM Computing Surveys, 31, 3, 264-323 (1999)
[40] Jain, A. K.; Dubes., R. C., Algorithms for clustering data (1988), Upper Saddle River, NJ: Prentice-Hall, Upper Saddle River, NJ · Zbl 0665.62061
[41] Kassambara, A., Practical guide to cluster analysis in R: Unsupervised machine learning (2017), CreateSpace Independent Publishing Platform
[42] Kaufman, L.; Rousseeuw, P. J., Finding groups in data: An introduction to cluster analysis (1990), Hoboken, NJ: Wiley, Hoboken, NJ · Zbl 1345.62009
[43] Klawonn, F., Fuzzy clustering: Insights and a new approach, Mathware & Soft Computing, 11, 125-142 (2004) · Zbl 1105.68414
[44] Kogan, J., Introduction to clustering large and high-dimensional data (2007), Cambridge: Cambridge University Press, Cambridge · Zbl 1183.62106
[45] Lance, G.; Williams., W., A general theory of classificatory sorting strategies I. Hierarchical systems, Computer Journal, 9, 4, 373-380 (1967)
[46] Lorr, M., Cluster analysis for social scientists. Jossey-Bass Social and Behavioral Science Series (1983), San Francisco: Jossey-Bass, San Francisco
[47] Macqueen, J., 1, 281-297 (1967), Berkely: University of California Press, Berkely
[48] McQuitty, L., Elementary linkage analysis for isolating orthogonal and oblique types andtypal relevancies, Educational and Psychological Measurement, 17, 207-222 (1957)
[49] Mirkin, B., Mathematical classification and clustering (1996), New York: Springer, New York · Zbl 0874.90198
[50] Nister, D.; Stewenius, H., Scalable recognition with a vocabulary tree, In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06, 2, 2161-2168 (2006)
[51] O’Hagan, A.; Ferrari., C., Model-based and nonparametric approaches to clustering for data compression in actuarial applications, North American Actuarial Journal, 21, 1, 107-146 (2017) · Zbl 07059858
[52] Rohlf, F.; Krishnaiah, P.; Kanal, L., Handbook of statistics, 2, Single link clustering algorithms, 267-284 (1982), Amsterdam: North-Holland, Amsterdam
[53] Sebestyen, G. S., Pattern recognition by an adaptive process of sample set construction, IRE Transactions on Information Theory, 8, 5, 82-91 (1962) · Zbl 0108.13903
[54] Selim, S.; Ismail., M., means-type algorithms: A generalized convergence theorem and characterizationof local optimality, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 1, 81-87 (1984) · Zbl 0546.62037
[55] Sneath, P., The applications of computers to taxonomy, Journal of General Microbiology, 17, 201-226 (1957)
[56] Theodoridis, S.; Koutroubas, K., Pattern recognition (1999), London: Academic Press, London
[57] Thorndike, R. L., Who belongs in the family?, Psychometrika, 18, 4, 267-276 (1953)
[58] Tryon, R. C., Cluster analysis: Correlation profile and orthometric (factor) analysis for the isolation of unities in mind and personality (1939), Ann Arbor, MI: Edwards Brothers, Ann Arbor, MI
[59] Ward, J. Jr., Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 301, 236-244 (1963)
[60] Ward, J. Jr.; Hook., M., Application of an hierarchical grouping procedure to a problem of groupingprofiles, Educational and Psychological Measurement, 23, 1, 69-81 (1963)
[61] Xu, R.; Wunsch, D., Clustering (2008), Hoboken, NJ: Wiley-IEEE Press, Hoboken, NJ
[62] Xu, W.; Chen, Y.; Coleman, C.; Coleman., T. F., Moment matching machine learning methods for risk management of large variable annuity portfolios, Journal of Economic Dynamics and Control, 87, 1-20 (2018) · Zbl 1401.91525
[63] Yao, J., Clustering in general insurance pricing. International Series on Actuarial Science, 2159-179 (2016), Cambridge: Cambridge University Press, Cambridge
[64] Zubin, J., A technique for measuring like-mindedness, Journal of Abnormal and Social Psychology, 33, 4, 508-516 (1938)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.