×

Predicting mortgage early delinquency with machine learning methods. (English) Zbl 1487.91145

Summary: This paper investigates the performance of thirteen methods for modelling and predicting mortgage early delinquency probabilities. These models include variants of logit models, some commonly used machine learning methods, and variants of ensemble models. We find that heterogenous ensemble methods lead other methods in the training, out-of-sample, and out-of-time datasets in terms of risk classification. Nonetheless, various predictive accuracy performance measures yield different rankings among the thirteen methods and no method consistently dominates in this performance dimension in the training, out-of-sample, and out-of-time data. Lastly, predictive accuracy is a major challenge facing all mortgage early delinquency models, even in the training data.

MSC:

91G40 Credit risk
62P05 Applications of statistics to actuarial sciences and financial mathematics
68T05 Learning and adaptive systems in artificial intelligence

Software:

XGBoost; Scikit
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Adams, N. M.; Anagnostopoulos, C.; Hand, D., Measuring classification performanceThe H-measure package (2012), Imperial College: Imperial College London, Technical report
[2] Addo, P. M.; Guegan, D.; Hassan, B., Credit risk analysis using machine and deep learning models (2018), Ca’ Foscari University of Venice, Working paper
[3] Altman, N. S., An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, 46, 3, 175-185 (1992)
[4] Baesens, B.; Van Gestel, T.; Viaene, S.; Stepanova, M.; Suykens, J.; Vanthienen, J., Benchmarking state-of-the-art classification algorithms for credit scoring, Journal of the Operational Research Society, 54, 627-635 (2003), & · Zbl 1097.91516
[5] Bishop, C. (1995). Neural networks for pattern recognition. Oxford University Press. · Zbl 0868.68096
[6] Breiman, L., Some infinity theory for predictors ensembles, Some infinity theory for predictors ensembles, 577 (2000), UC Berkeley: UC Berkeley US, Technical Report 577
[7] Breiman, L., Consistency for a sample model of random forests, Consistency for a sample model of random forests, 670 (2004), UC Berkeley: UC Berkeley US, Technical Report 670
[8] Butaru, F.; Chen, Q.; Clark, B.; Das, S.; Lo, A. W.; Siddique, A., Risk and risk management in the credit card industry, Journal of Banking & Finance, 72, 218-239 (2016)
[9] Caruana, R.; Niculescu-Mizil, A.; Crew, G.; Ksikes, A., Ensemble selection from libraries of models, (In Proceedings of the twenty-first international conference on Machine learning ACM, 18 (2004))
[10] Chen, T.; Guestrin, C., XGBoost: A scalable tree boosting system, (Proceedings of the 22nd ACM SIGKDD conference on knowledge discovery and data mining (2016)), 785-794
[11] Cortes, C.; Vapnik, V., Support-vector networks, Machine learning, 20, 3, 273-297 (1995) · Zbl 0831.68098
[12] Crook, J. N.; Edelman, D. B.; Thomas, L. C., Recent developments in consumer credit risk assessment, European Journal of Operational Research, 183, 1447-1465 (2007) · Zbl 1138.91493
[13] David, R. H.; Edelman, D. B.; Gammerman, A. J., Machine-learning algorithms for credit-card applications, IMA Journal of Mathematics Applied in Business & Industry, 4, 43-51 (1992)
[14] Desai, V. S.; Crook, J. N.; Overstreet, G. A., A comparison of neural networks and linear scoring models in the credit union environment, European Journal of Operational Research, 95, 24-37 (1996) · Zbl 0955.90506
[15] Ernst, D.; Wehenkel, L., Extremely randomized trees, Machine Learning, 63, 3-42 (2006) · Zbl 1110.68124
[16] Fitzpatrick, T.; Mues, C., An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market, European Journal of Operational Research, 249, 427-439 (2016) · Zbl 1346.62103
[17] Freund, Y.; Schapire, R.; Abe, N., A short introduction to boosting, Journal-Japanese Society For Artificial Intelligence, 14, 771-780 (1999)
[18] Friedman, J.; Hastie, T.; Tibshirani, R., The elements of statistical learning, 1, 337-387 (2001), Springer series in statistics · Zbl 0973.62007
[19] Friedman, J. H., Greedy function approximation: A gradient boosting machine, The Annuals of Statistics, 29, 5, 1189-1232 (2001) · Zbl 1043.62034
[20] Galindo, J.; Tamayo, P., Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications, Computational Economics, 15, 107-143 (2000) · Zbl 0969.91006
[21] Genuer, R.; Poggi, J.-. M.; Tuleau, C., Random forestsSome methodological insights (2008), INRIA, Research Report RR-6729
[22] Hand, D. J., Mining the past to determine the future: Problems and possibilities, International Journal of Forecastin,g, 25, 441-451 (2009)
[23] Hand, D. J., Measuring classifier performance, a coherent alternative to area under the ROC curve, Machine Learning, 77, 103-123 (2009) · Zbl 1470.62085
[24] Hastie, T. J.; Tibshirani, R. J., Generalized additive models (1990), Chapman & Hall/CRC · Zbl 0747.62061
[25] Kennedy, K.; Namee, B., . M..; Delany, S. J., Using semi-supervised classifiers for credit scoring, The Journal of the Operational Research Society, 64, 4, 513-529 (2013)
[26] Khandani, A. E.; Kim, A. J.; &Lo, A. W., Consumer credit risk models via machine-learning algorithms, Journal of Banking & Finance, 34, 11, 2767-2787 (2010)
[27] Kvamme, H.; Sellereite, N.; Aas, K.; Sjursen, S., Predicting mortgage default using convolutional neural networks, Expert Systems With Applications, 102, 207-217 (2018)
[28] Lessmann, S.; Baesens, B.; Seow, H.-. V.; Thomas, L. C., Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research, European Journal of Operational Research, 247, 124-136 (2015) · Zbl 1346.90835
[29] Li, Y.; Bellotti, T.; Adams, N., Machine learning performance over long time frame, (Paper presented at credit scoring and credit control conference (2017), University of Edinburgh)
[30] Li, Y.; Wang, X.; Djehiche, B.; Hu, X., Credit scoring by incorporating dynamic networked information, European Journal of Operational Research, 286, 1103-1112 (2020) · Zbl 1443.91314
[31] Martens, D.; Baesens, B.; Gestel, T. V.; Venthienan, J., Comprehensive credit scoring models using rule extraction from support vector machines, Europeans Journal of Operational Research, 183, 1466-1476 (2007) · Zbl 1278.91177
[32] Mullainathan, S.; Spiess, J., Machine learning: An applied econometric approach, Journal of Economic Perspective,s, 31, 2, 87-106 (2017)
[33] Nikolaou, N.; Edakunni, N.; Kull, M.; Flach, P.; Brown, G., Cost-sensitive boosting algorithms: Do we really need them, Machine Learning, 104, 359-384 (2016) · Zbl 1386.68136
[34] Opitz, D.; Maclin, R., Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research, 11, 169-198 (1999) · Zbl 0924.68159
[35] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O., Scikit-learn: Machine Learning in Pytho, Journal of Machine Learning Research, 12, 2825-2830 (2011) · Zbl 1280.68189
[36] Platt, J., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, Advances in Large Margin Classifiers, 10, 3, 61-74 (1999)
[37] Qi, M.; Zhang, X.; Zhao, X., Unobservable systematic risk factor and default prediction, Journal of Banking & Finance, 49, 216-227 (2014)
[38] Sirignano, J. A.; Sadhwani, A.; Giesecke, K., Deep learning for mortgage risk (2018), Univ. of Illinois at Urbana-Champaign, Working paper
[39] Thomas, L. C.; Crook, J. N.; Edelman, D. B., Credit scoring and its applications, The Society for Industrial and Applied Mathematics (2017) · Zbl 1425.91002
[40] Varian, H. R., Big data: New tricks for econometrics, Journal of Economic Perspectives, 28, 2, 3-28 (2014)
[41] Verbraken, T.; Bravo, C.; Weber, R.; Baesens, B., Development and application of consumer credit scoring models using profit-based classification measures, European Journal of Operational Research, 238, 505-513 (2014) · Zbl 1338.91146
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.