×

A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. (English) Zbl 1292.68131

Summary: The availability of abundant data posts a challenge to integrate static customer data and longitudinal behavioral data to improve performance in customer churn prediction. Usually, longitudinal behavioral data are transformed into static data before being included in a prediction model. In this study, a framework with ensemble techniques is presented for customer churn prediction directly using longitudinal behavioral data. A novel approach called the hierarchical multiple kernel support vector machine (H-MK-SVM) is formulated. A three phase training algorithm for the H-MK-SVM is developed, implemented and tested. The H-MK-SVM constructs a classification function by estimating the coefficients of both static and longitudinal behavioral variables in the training process without transformation of the longitudinal behavioral data. The training process of the H-MK-SVM is also a feature selection and time subsequence selection process because the sparse non-zero coefficients correspond to the variables selected. Computational experiments using three real-world databases were conducted. Computational results using multiple criteria measuring performance show that the H-MK-SVM directly using longitudinal behavioral data performs better than currently available classifiers.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
90C90 Applications of mathematical programming

Software:

SimpleMKL; SHOGUN
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bach, F.R., Lanckrient, G.R.G., Jordan, M.I., 2004. Multiple kernel learning, conic duality and the SMO algorithm. In: Russell, G., Dale, S. (Eds.), Proceedings of the Twenty First International Conference on Machine Learning, pp. 41-48.; Bach, F.R., Lanckrient, G.R.G., Jordan, M.I., 2004. Multiple kernel learning, conic duality and the SMO algorithm. In: Russell, G., Dale, S. (Eds.), Proceedings of the Twenty First International Conference on Machine Learning, pp. 41-48.
[2] Baesens, B.; Verstraeten, G.; Van den Poel, D.; Egmont-Petersen, M.; Van Kenhove, P.; Vanthienen, J., Bayesian network classifiers for identifying the slope of the customer lifecycle of long-life customers, European Journal of Operational Research, 156, 508-523 (2004) · Zbl 1056.90019
[3] Benoit, D. F.; Van den Poel, D., Benefits of quantile regression for the analysis of customer lifetime value in a contractual setting: an application in financial services, Expert Systems with Applications, 36, 10475-10484 (2009)
[4] Buckinx, W.; Van den Poel, D., Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting, European Journal of Operational Research, 164, 252-268 (2005) · Zbl 1132.90349
[5] Burez, J.; Van den Poel, D., CRM at a pay-TV company: using analytical models to reduce customer attrition by targeted marketing for subscription services, Expert Systems with Applications, 32, 277-288 (2007)
[6] Burez, J.; Van den Poel, D., Handling class imbalance in customer churn prediction, Expert Systems with Applications, 36, 4626-4636 (2009)
[7] Cao, L., In-depth behavior understanding and use: the behavior informatics approach, Information Sciences, 180, 3067-3085 (2010)
[8] Cao, L.; Yu, P. S., Behavior informatics: an informatics perspective for behavior studies, IEEE Intelligent Informatics Bulletin, 10, 6-11 (2009)
[9] Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing multiple parameters for support vector machines, Machine Learning, 46, 131-159 (2002) · Zbl 0998.68101
[10] Chen, M. C.; Chiu, A. L.; Chang, H. H., Mining changes in customer behavior in retail marketing, Expert Systems with Applications, 28, 773-781 (2005)
[11] Chen, Z. Y.; Li, J. P.; Wei, L. W., A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue, Artificial Intelligence in Medicine, 41, 161-175 (2007)
[12] Chen, Z. Y.; Li, J. P.; Wei, L. W.; Xu, W. X.; Shi, Y., Multiple kernel support vector machine based multiple tasks oriented data mining system for gene expression data analysis, Expert System with Applications, 38, 12151-12159 (2011)
[13] Coussement, K.; Van den Poel, D., Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques, Expert Systems with Applications, 34, 313-327 (2008)
[14] Coussement, K.; Van den Poel, D., Integrating the voice of customers through call center emails into a decision support system for churn prediction, Information & Management, 45, 164-174 (2008)
[15] Crone, S. F.; Lessmann, S.; Stahlbock, R., The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing, European Journal of Operational Research, 173, 781-800 (2006) · Zbl 1120.90349
[16] Dekimpe, M. G.; Hanssens, D. M., Time-series models in marketing: past, present and future, International Journal of Research in Marketing, 17, 183-193 (2000)
[17] Demiriz, A.; Bennett, K. P.; Shawe-Taylor, J., Linear programming boosting via column generation, Machine Learning, 46, 225-254 (2002) · Zbl 0998.68105
[18] Eichinger, F., Nauck, D.D., Klawonn, F., 2006. Sequence mining for customer behaviour predictions in telecommunications. In: Proceedings of the Workshop on Practical Data Mining: Applications, Experiences and Challenges, Berlin, Germany.; Eichinger, F., Nauck, D.D., Klawonn, F., 2006. Sequence mining for customer behaviour predictions in telecommunications. In: Proceedings of the Workshop on Practical Data Mining: Applications, Experiences and Challenges, Berlin, Germany.
[19] Glady, N.; Baesens, B.; Croux, C., Modeling churn using customer lifetime value, European Journal of Operational Research, 197, 402-411 (2009) · Zbl 1157.91396
[20] Gönen, M.; Alpaydın, E., Multiple kernel learning algorithms, Journal of Machine Learning Research, 12, 2211-2268 (2011) · Zbl 1280.68167
[21] Gunn, S. R.; Kandola, J. S., Structural modeling with sparse kernels, Machine Learning, 48, 137-163 (2002) · Zbl 0998.68119
[22] Huang, B. Q.; Kechadi, T.-M.; Buckley, B.; Kiernan, G.; Keogh, E.; Rashid, T., A new feature set with new window techniques for customer churn prediction in land-line telecommunications, Expert Systems with Applications, 37, 3657-3665 (2010)
[23] Keerthi, S. S.; Sindhwani, V.; Chapelle, O., An efficient method for gradient-based adaptation of hyperparameters in SVM models, (Schölkopf, B.; Platt, J. C.; Hoffman, T., Advances in Neural Information Processing Systems, vol. 19 (2007), MIT Press: MIT Press Cambridge), 217-224
[24] Kisioglu, P.; Topcu, Y. I., Applying Bayesian belief network approach to customer churn analysis: a case study on the telecom industry of Turkey, Expert Systems with Applications, 38, 7151-7157 (2010)
[25] Lanckrient, G. R.G.; Cristianini, N.; Bartlett, P.; El Ghaoui, L.; Jordan, M. I., Learning the kernel matrix with semidefinite programming, Journal of Machine Learning Research, 5, 27-72 (2004) · Zbl 1222.68241
[26] Lanckriet, G. R.G.; De Bie, T.; Cristianini, N.; Jordan, M. I.; Noble, W. S., A statistical framework for genomic data fusion, Bioinformatics, 20, 2626-2635 (2004)
[27] Lee, Y.; Kim, Y.; Lee, S.; Koo, J. Y., Structured multicategory support vector machines with analysis of variance decomposition, Biometrika, 93, 555-571 (2006) · Zbl 1108.62059
[28] Lessmann, S.; Voß, S., Supervised classification for decision support in customer relationship management, (Bortfeldt, A.; Homberger, J.; Kopfer, H.; Pankratz, G.; Stangmeier, R., Intelligent Decision Support (2008), Gabler: Gabler Wiesbaden), 231-253
[29] Lessmann, S.; Voß, S., A reference model for customer-centric data mining with support vector machines, European Journal of Operational Research, 199, 520-530 (2009) · Zbl 1176.90340
[30] Lin, Y.; Zhang, H. H., Component selection and smoothing in multivariate nonparametric regression, The Annals of Statistics, 34, 2272-2297 (2006) · Zbl 1106.62041
[31] Ngai, E. W.T.; Xiu, L.; Chau, D. C.K., Application of data mining techniques in customer relationship management: a literature review and classification, Expert System with Applications, 36, 2592-2602 (2009)
[32] Orsenigo, C.; Vercellis, C., Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification, Pattern Recognition, 43, 3787-3794 (2010) · Zbl 1209.68422
[33] Prinzie, A.; Van den Poel, D., Incorporating sequential information into traditional classification models by using an element/position-sensitive SAM, Decision Support Systems, 42, 508-526 (2006)
[34] Prinzie, A.; Van den Poel, D., Investigating purchasing sequence patterns for financial services using Markov, MTD and MTDg models, European Journal of Operational Research, 170, 710-734 (2006) · Zbl 1091.90527
[35] Prinzie, A.; Van den Poel, D., Predicting home-appliance acquisition sequences: Markov/Markov for discrimination and survival analysis for modelling sequential information in NPTB models, Decision Support Systems, 44, 28-45 (2007)
[36] Prinzie, A.; Van den Poel, D., Modeling complex longitudinal consumer behavior with dynamic Bayesian networks: an acquisition pattern analysis application, Journal of Intelligent Information System (2009)
[37] Qi, J.; Zhang, L.; Liu, Y.; Li, L.; Zhou, Y.; Shen, Y.; Liang, L.; Li, H., ADTreesLogit model for customer churn prediction, Annuls of Operations Research, 168, 247-265 (2009) · Zbl 1179.90037
[38] Rakotomamonjy, A.; Bach, F. R.; Canu, S.; Grandvalet, Y., SimpleMKL, Journal of Machine Learning Research, 9, 2491-2521 (2008) · Zbl 1225.68208
[39] Schölkopf, B.; Smolla, A., Learning with kernels-Support Vector Machines, Regularization, Optimization and Beyond (2002), MIT press: MIT press Cambridge, MA
[40] Sonnenburg, S.; Rätsch, G.; Schäfer, C.; Schölkopf, B., Large scale multiple kernel learning, Journal of Machine Learning Research,1, 1-18 (2006)
[41] Sun, J.; He, K. Y.; Li, H., SFFS-PC-NN optimized by genetic algorithm for dynamic prediction of financial distress with longitudinal data streams, Knowledge-Based Systems, 24, 1013-1023 (2011)
[42] Tsai, C. F.; Lu, Y. H., Customer churn prediction by hybrid neural networks, Expert Systems with Applications, 36, 12547-12553 (2009)
[43] Tsai, C. Y.; Shieh, Y. C., A change detection method for sequential patterns, Decision Support Systems, 46, 501-511 (2009)
[44] Tseng, P., Convergence of a block coordinate decent method for nondifferentiable minimization, Journal of Optimization Theory and Applications, 109, 475-494 (2001) · Zbl 1006.65062
[45] Vapnik, V. N., The Nature of Statistic Learning Theory (1995), Springer: Springer New York · Zbl 0934.62009
[46] Vapnik, V. N., Statistic Learning Theory (1998), Wiley: Wiley New York · Zbl 0934.62009
[47] Verbeke, W.; Martens, D.; Mues, C.; Baesens, B., Building comprehensible customer churn prediction models with advanced rule induction techniques, Expert Systems with Applications, 38, 2354-2364 (2011)
[48] Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B., New insights into churn prediction in the telecommunication sector: A profit driven data mining approach, European Journal of Operational Research, 218, 211-229 (2011)
[49] Van den Poel, D.; Larivière, B., Customer attrition analysis for financial services using proportional hazard models, European Journal of Operational Research, 157, 196-217 (2004) · Zbl 1106.91318
[50] Woodsend, K.; Gondzio, J., Hybrid MPI/OpenMP parallel linear support vector machine training, Journal of Machine Learning Research, 10, 1937-1953 (2009) · Zbl 1235.68205
[51] Yu, X.; Guo, S.; Guo, J.; Huang, X., An extended support vector machine forecasting framework for customer churn in E-commerce, Expert Systems with Applications, 38, 1425-1430 (2010)
[52] Zhu, X.; Li, B.; Wu, X.; He, D.; Zhang, C., CLAP: Collaborative pattern mining for distributed information systems, Decision Support Systems (2011)
[53] Zorn, S.; Jarvis, W.; Bellman, S., Attitudinal perspectives for predicting churn, Journal of Research in Interactive Marketing, 4, 157-169 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.