×

zbMATH — the first resource for mathematics

Learning the naive Bayes classifier with optimization models. (English) Zbl 1284.93218
Summary: Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.

MSC:
93E03 Stochastic systems in control theory (general)
62D05 Sampling theory, sample surveys
Software:
C4.5; LIBSVM; UCI-ml
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Asuncion, A. and Newman, D. (2007). UCI machine learning repository, http://www.ics.uci.edu/mlearn/mlrepository.
[2] Campos, M., Fernandez-Luna, Gamez, A. and Puerta, M. (2002). Ant colony optimization for learning Bayesian networks, International Journal of Approximate Reasoning 31(3): 291-311. · Zbl 1033.68091 · doi:10.1016/S0888-613X(02)00091-9
[3] Chang, C. and Lin, C. (2001). LIBSVM: A library for support vector machines, http://www.csie.ntu.edu.tw/cjlin/libsvm.
[4] Chickering, D.M. (1996). Learning Bayesian networks is NP-complete, in D. Fisher and H. Lenz , Artificial Intelligence and Statistics, Springer-Verlag, Berlin/Heidelberg, pp. 121-130.
[5] Crawford, E., Kay, J. and Eric, M. (2002). The intelligent email sorter, Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 83-90.
[6] Domingos, P. and Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple Bayesian classifier, Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, pp. 105-112.
[7] Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss, Machine Learning (29): 103-130. · Zbl 0892.68076 · doi:10.1023/A:1007413511361
[8] Dougherty, J., Kohavi, R. and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features, Proceedings of the 12th International Conference on Machine Learning, San Francisco, CA, USA, pp. 194-202.
[9] Fayyad, U.M. and Irani, K. (1993). On the handling of continuous-valued attributes in decision tree generation, Machine Learning 8: 87-102. · Zbl 0767.68084
[10] Friedman, N., Geiger, D. and Goldszmidti, M. (1997). Bayesian network classifiers, Machine Learning 29(2): 131-163. · Zbl 0892.68077 · doi:10.1023/A:1007465528199 · gateway.webofknowledge.com
[11] Heckerman, D., Chickering, D. and Meek, C. (2004). Large sample learning of Bayesian networks is NP-hard, Journal of Machine Learning Research 5: 1287-1330. · Zbl 1222.68169 · www.jmlr.org
[12] Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective, Artificial Intelligence in Medicine 23: 89-109. · Zbl 05390854 · doi:10.1016/S0933-3657(01)00077-X
[13] Langley, P., Iba, W. and Thompson, K. (1992). An analysis of Bayesian classifiers, 10th International Conference on Artificial Intelligence, San Jose, CA, USA, pp. 223-228.
[14] Miyahara, K. and Pazzani, M.J. (2000). Collaborative filtering with the simple Bayesian classifier, Proceedings of the 6th Pacific Rim International Conference on Artificial Intelligence, Melbourne, Australia, pp. 679-689.
[15] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Fransisco, CA. · Zbl 0746.68089
[16] Polanska, J., Borys, D. and Polanski, A. (2006). Node assignment problem in Bayesian networks, International Journal of Applied Mathematics and Computer Science 16(2): 233-240. · Zbl 1147.62389 · eudml:207788
[17] Taheri, S. and Mammadov, M. (2012). Structure learning of Bayesian networks using a new unrestricted dependency algorithm, IMMM 2012: The 2nd International Conference on Advances in Information on Mining and Management, Venice, Italy, pp. 54-59.
[18] Taheri, S., Mammadov, M. and Bagirov, A. (2011). Improving naive Bayes classifier using conditional probabilities, 9th Australasian Data Mining Conference, Ballarat, Australia, pp. 63-68.
[19] Taheri, S., Mammadov, M. and Seifollahi, S. (2012). Globally convergent algorithms for solving unconstrained optimization problems, Optimization: 1-15. · Zbl 1311.90182 · doi:10.1080/02331934.2012.745529
[20] Tóth, L., Kocsor, A. and Csirik, J. (2005). On naive Bayes in speech recognition, International Journal of AppliedMathematics and Computer Science 15(2): 287-294. · Zbl 1085.68667 · eudml:207743
[21] Wu, X., Vipin Kumar, J., Quinlan, R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, J., Ng, A., Liu, B., Yu, P. S., Zhou, Z., Steinbach, M., Hand, D. J. and Steinberg, D. (2008). Top 10 algorithms in data mining, Knowledge and Information Systems 14: 1-37.
[22] Yatsko, A., Bagirov, A.M. and Stranieri, A. (2011). On the discretization of continuous features for classification, Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), Ballarat, Australia, Vol. 125.
[23] Zaidi, A., Ould Bouamama, B. and Tagina, M. (2012). Bayesian reliability models of Weibull systems: State of the art, International Journal of Applied Mathematics and Computer Science 22(3): 585-600, DOI: 10.2478/v10006-012-0045-2. · Zbl 1333.60192 · doi:10.2478/v10006-012-0045-2 · gateway.webofknowledge.com
[24] Zupan, B., Demsar, J., Kattan, M.W., Ohori, M., Graefen, M., Bohanec, M. and Beck, J.R. (2001). Orange and decisions-at-hand: Bridging predictive data mining and decision support, Proceedings of the ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, Freiburg, Germany, pp. 151-162.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.