A criterion for the comparison of binary classifiers based on a stochastic dominance with an application to the sale of home insurances. (English) Zbl 1422.91362

Summary: Binary classification is an essential matter in multiple real-life problems, and so, the comparison of the performance of classifiers is a key issue. A criterion for that purpose is introduced in this manuscript. That criterion is based on a stochastic dominance, and permits to compare classifiers in subgroups of the population with the same size. By means of the new criterion, the alteration of the size of the subgroups where classifiers are compared, does not entail the modification of the suitable classifier. Characterization results of the criterion are proved. For that purpose, connections of the criterion with the theory of copulas, and with a tool introduced in the manuscript, the so-called continuity modelling vector, are essential. An application to the comparison of some classifiers for the detection of purchasers of home insurances is developed.


91B30 Risk theory, insurance (MSC2010)
60E15 Inequalities; stochastic orderings
62P05 Applications of statistics to actuarial sciences and financial mathematics
Full Text: DOI Link


[1] Belzunce, F.; Martínez-Riquelme, C.; Mulero, J., An introduction to stochastic orders, (2016), Amsterdam: Elsevier/Academic Press, Amsterdam · Zbl 1366.60001
[2] Billingsley, P., Convergence of probability measures, (1968), New York-London-Sydney: John Wiley & Sons, Inc, New York-London-Sydney · Zbl 0172.21201
[3] 1992Probability. Corrected reprint of the 1968 original. Classics in Applied Mathematics, 7. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA
[4] Breiman, L., Random forests, Machine Learning, 45, 5-32, (2001) · Zbl 1007.68152
[5] Buckinx, W.; Van den Poel, D., Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting, European Journal of Operational Research, 164, 252-268, (2005) · Zbl 1132.90349
[6] Cai, J.; Wei, W., On the invariant properties of notions of positive dependence and copulas under increasing transformations, Insurance: Mathematics and Economics, 50, 43-49, (2012) · Zbl 1239.91074
[7] Cherubini, U.; Luciano, E.; Vecchiato, W., Copula methods in finance, (2004), Chichester: Wiley Finance Series, John Wiley & Sons, Chichester · Zbl 1163.62081
[8] Cuadras, C.; Augé, J., A continuous general multivariate distribution and its properties, Communications in Statistics - Theory and Methods, 10, 339-353, (1981) · Zbl 0456.62013
[9] Esary, J. D.; Proschan, F., Relationships among some concepts of bivariate dependence, The Annals of Mathematical Statistics, 43, 651-655, (1972) · Zbl 0263.62011
[10] Figini, S.; Giudici, P., Bayesian churn models, Advances and Applications in Statistical Sciences, 1, 285-310, (2010) · Zbl 1260.62020
[11] Günther, C. C.; Tvete, I. F.; Aas, K.; Sandnes, G. I.; Borgan, O., Modelling and predicting customer churn from an insurance company, Scandinavian Actuarial Journal, 1, 58-71, (2014) · Zbl 1401.91144
[12] Hand, D. J., Measuring classifier performance: a coherent alternative to the area under the ROC curve, Machine Learning, 77, 103-123, (2009)
[13] Hand, D. J., Evaluating diagnostic tests: the area under the ROC curve and the balance of errors, Statistics in Medicine, 29, 1502-1510, (2010)
[14] Hand, D. J., Assessing the performance of classification methods, International Statistical Review, 80, 400-414, (2012) · Zbl 1416.62339
[15] Hand, D. J.; Anagnostopoulos, C., A better Beta for the H measure of classification performance, Pattern Recognition Letters, 40, 41-46, (2012)
[16] Hand, D. J.; Anagnostopoulos, C., When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance?, Pattern Recognition Letters, 34, 492-495, (2013)
[17] Hand, D. J.; Zhou, F., Evaluating models for classifying customers in retail baking collections, Journal of the Operational Research Society, 61, 1540-1547, (2009)
[18] Hung, S.; Yen, D. C.; Wang, H., Applying data mining to telecom churn management, Expert Systems with Applications, 31, 515-524, (2006)
[19] Hwang, H.; Jung, T.; Suh, E., An LTV model and customer segmentation based on customer value: a case study on the wireless telecommunication industry, Expert Systems with Applications, 26, 181-188, (2004)
[20] Lee, W., Probabilistic analysis of global performances of diagnostic tests: interpreting the Lorenz curve-based summary measures, Statistics in Medicine, 18, 455-471, (1999)
[21] Lehmann, E. L., Some concepts of dependence, The Annals of Mathematical Statistics, 37, 1137-1153, (1966) · Zbl 0146.40601
[22] Lloyd, C. J., Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems, Journal of the American Statistical Association, 93, 1356-1364, (1998)
[23] López-Díaz, M. C.; López-Díaz, M.; Martínez-Férnandez, S., A stochastic comparison of customer classifiers with an application to customer attrition in commercial banking, Scandinavian Actuarial Journal, 7, 606-627, (2017) · Zbl 1402.91267
[24] Müller, A.; Stoyan, D., Comparison methods for stochastic models and risks, (2002), Chichester: John Wiley & Sons, Chichester · Zbl 0999.60002
[25] 2006An introduction to copulas2nd edNew YorkSpringer
[26] Qi, J.; Zhang, L.; Liu, Y.; Li, L.; Zhou, Y.; Shen, Y.; Liang, L.; Li, H., ADTreesLogit model for customer churn prediction, Annals of Operations Research, 168, 247-265, (2009) · Zbl 1179.90037
[27] Rao, C. R., Linear statistical inference and its applications, (1973), New York-London-Sydney: John Wiley and Sons, New York-London-Sydney
[28] Shaked, M.; Shanthikumar, J. G., Stochastic orders, (2007), New York: Springer, New York
[29] Shorack, G. R.; Wellner, J. A., Empirical processes with applications to statistics, (1986), New York: John Wiley & Sons, Inc, New York · Zbl 1170.62365
[30] Wei, C.; Chiu, I., Turning telecommunications call details to churn prediction: a data mining approach, Expert Systems with Applications, 23, 103-112, (2002)
[31] Yousef, W. A., Assessing classifiers in terms of the partial area under the ROC curve, Computational Statistics & Data Analysis, 64, 51-70, (2013) · Zbl 1468.62223
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.