Discrete Bayesian network classifiers: a survey.

*(English)*Zbl 1322.68147##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

68T37 | Reasoning under uncertainty in the context of artificial intelligence |

68-02 | Research exposition (monographs, survey articles) pertaining to computer science |

##### Keywords:

supervised classification; Bayesian network; Naive Bayes; Markov blanket; Bayesian multinets; feature subset selection; generative and discriminative classifiers
PDF
BibTeX
XML
Cite

\textit{C. Bielza} and \textit{P. Larrañaga}, ACM Comput. Surv. 47, No. 1, Paper No. 5, 43 p. (2014; Zbl 1322.68147)

Full Text:
DOI

##### References:

[1] | J. Abellán. 2006. Application of uncertainty measures on credal sets on the naive Bayes classifier. International Journal of General Systems 35 (2006), 675–686. · Zbl 1111.68607 |

[2] | J. Abellán, A. Cano, A. R. Masegosa, and S. Moral. 2007. A semi-naive Bayes classifier with grouping of cases. In Proceedings of the 9th European Conference in Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2007). Lecture Notes in Artificial Intelligence, Vol. 4724. Springer, 477–488. · Zbl 1148.68423 |

[3] | S. Acid, L. M. de Campos, and J. G. Castellano. 2005. Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. Machine Learning 59, 3 (2005), 213–235. · Zbl 1101.68710 |

[4] | A. Agresti. 1990. Categorical Data Analysis. Wiley. · Zbl 0716.62001 |

[5] | K. M. Al-Aidaroos, A. A. Bakar, and Z. Othman. 2010. Naive Bayes variants in classification learning. In Proceedings of the International Conference on Information Retrieval Knowledge Management (CAMP-2010). 276–281. |

[6] | C. F. Aliferis, A. R. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification. Part I: Algorithms and empirical evaluation. Journal of Machine Learning Research 11 (2010), 171–234. · Zbl 1242.68197 |

[7] | C. F. Aliferis, I. Tsamardinos, and M. S. Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings. 21–25. |

[8] | K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. (2013). Retrieved from http://archive.ics.uci.edu/ml. |

[9] | X. Bai, R. Padman, J. Ramsey, and P. Spirtes. 2008. Tabu search-enhanced graphical models for classification in high dimensions. INFORMS Journal on Computing 20, 3 (2008), 423–437. · Zbl 1243.90074 |

[10] | J. Bilmes. 2000. Dynamic Bayesian multinets. In Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence (UAI-2000). Morgan Kaufmann, 38–45. |

[11] | C. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press. · Zbl 0868.68096 |

[12] | C. M. Bishop and J. Lasserre. 2007. Generative or discriminative? Getting the best of both worlds. In Bayesian Statistics, Vol. 8. Oxford University Press, 3–23. · Zbl 1252.62063 |

[13] | R. Blanco, I. Inza, M. Merino, J. Quiroga, and P. Larrañaga. 2005. Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS. Journal of Biomedical Informatics 38, 5 (2005), 376–388. |

[14] | W. L. Buntine. 1991. Theory refinement on Bayesian networks. In Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (UAI-1991). Morgan Kaufmann, 52–60. |

[15] | J. Burge and T. Lane. 2005. Learning class-discriminative dynamic Bayesian networks. In Proceedings of the 22nd International Conference on Machine Learning (ICML-2005). ACM, 97–104. |

[16] | A. Cano, J. G. Castellano, A. R. Masegosa, and S. Moral. 2005. Methods to determine the branching attribute in Bayesian multinets classifiers. In Proceedings of the 8th European Conference in Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2005). Lecture Notes in Artificial Intelligence, Vol. 3571. Springer, 932–943. · Zbl 1122.68488 |

[17] | A. M. Carvalho, A. L. Oliveira, and M.-F. Sagot. 2007. Efficient learning of Bayesian network classifiers. In Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI-2007). Lecture Notes in Computer Science, Vol. 4830. Springer, 16–25. |

[18] | A. M. Carvalho, T. Roos, A. L. Oliveira, and P. Myllymäki. 2011. Discriminative learning of Bayesian networks via factorized conditional log-likelihood. Journal of Machine Learning Research 12 (2011), 2181–2210. · Zbl 1280.68158 |

[19] | J. Cerquides and R. López de Mántaras. 2005a. Robust Bayesian linear classifier ensembles. In Proceedings of the 16th European Conference on Machine Learning (ECML-2005). Lecture Notes in Computer Science, Vol. 3720. Springer, 72–83. |

[20] | J. Cerquides and R. López de Mántaras. 2005b. TAN classifiers based on decomposable distributions. Machine Learning 59, 3 (2005), 323–354. |

[21] | B. Cestnik. 1990. Estimating probabilities: A crucial task in machine learning. In Proceedings of the European Conference in Artificial Intelligence. 147–149. |

[22] | X. Chai, L. Deng, Q. Yang, and C. X. Ling. 2004. Test-cost sensitive naive Bayes classification. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM-2004). IEEE Computer Society, 51–58. |

[23] | J. Cheng and R. Greiner. 1999. Comparing Bayesian network classifiers. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-1999). Morgan Kaufmann Publishers, 101–108. |

[24] | J. Cheng and R. Greiner. 2001. Learning Bayesian belief networks classifiers: Algorithms and system. In Proceedings of the 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence (CSCSI-2001), Vol. 2056. Springer, 141–151. · Zbl 0984.68191 |

[25] | D. M. Chickering. 1995. A transformational characterization of equivalent Bayesian network structures. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI-1995). Morgan Kaufmann, 87–98. |

[26] | D. M. Chickering, D. Heckerman, and C. Meek. 2004. Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research 5 (2004), 1287–1330. · Zbl 1222.68169 |

[27] | C. Chow and C. Liu. 1968. Approximating discrete probability distributions with dependency trees. IEEE Transactions on Information Theory 14 (1968), 462–467. · Zbl 0165.22305 |

[28] | G. F. Cooper and E. Herskovits. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9 (1992), 309–347. · Zbl 0766.68109 |

[29] | D. Dash and G. F. Cooper. 2004. Model averaging for prediction with discrete Bayesian networks. Journal of Machine Learning Research 5 (2004), 1177–1203. · Zbl 1222.68178 |

[30] | D. Dash and G. F. Cooper. 2002. Exact model averaging with naïve Bayesian classifiers. In Proceedings of the 19th International Conference on Machine Learning (ICML-2002). 91–98. |

[31] | A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B 39, 1 (1977), 1–38. · Zbl 0364.62022 |

[32] | P. Domingos and M. Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29 (1997), 103–130. · Zbl 0892.68076 |

[33] | E. B. dos Santos, E. R. Hruschka Jr., E. R. Hruschka, and N. F. F. Ebecken. 2011. Bayesian network classifiers: Beyond classification accuracy. Intelligent Data Analysis 15, 3 (2011), 279–298. |

[34] | M. M. Drugan and M. A. Wiering. 2010. Feature selection for Bayesian network classifiers using the MDL-FS score. International Journal of Approximate Reasoning 51 (2010), 695–717. · Zbl 1205.68286 |

[35] | R. Duda, P. Hart, and D. G. Stork. 2001. Pattern Classification. John Wiley and Sons. · Zbl 0968.68140 |

[36] | D. Edwards and S. L. Lauritzen. 2001. The TM algorithm for maximising a conditional likelihood function. Biometrika 88 (2001), 961–972. · Zbl 1099.62526 |

[37] | M. Ekdahl and T. Koski. 2006. Bounds for the loss in probability of correct classification under model based approximation. Journal of Machine Learning Research 7 (2006), 2449–2480. · Zbl 1222.68189 |

[38] | S. Eyheramendy, D. D. Lewis, and D. Madigan. 2002. On the naive Bayes model for text categorization. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (AISTATS-2002). |

[39] | K. J. Ezawa and S. W. Norton. 1996. Constructing Bayesian networks to predict uncollectible telecommunications accounts. IEEE Expert 11, 5 (1996), 45–51. · Zbl 05095563 |

[40] | A. J. Feelders and J. Ivanovs. 2006. Discriminative scoring of Bayesian network classifiers: A comparative study. In Proceedings of the 3rd European Workshop on Probabilistic Graphical Models (PGM-2006). 75–82. |

[41] | Q. Feng, F. Tian, and H. Huang. 2007. A discriminative learning method of TAN classifier. In Proceedings of the 9th European Conference in Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2007). Lecture Notes in Artificial Intelligence, Vol. 4724. Springer, 443–452. · Zbl 1148.68432 |

[42] | J. Flores, J. A. Gámez, and A. M. Martínez. 2012. Supervised classification with Bayesian networks: A review on models and applications. In Intelligent Data Analysis for Real World Applications. Theory and Practice. IGI Global, 72–102. |

[43] | M. J. Flores, J. A. Gámez, A. M. Martínez, and J. M. Puerta. 2009. HODE: Hidden one-dependence estimator. In Proceedings of the 10th European Conference in Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2009). Lecture Notes in Artificial Intelligence, Vol. 5590. Springer, 481–492. · Zbl 1245.62076 |

[44] | O. François and P. Leray. 2006. Learning the tree augmented naive Bayes classifier from incomplete datasets. In Proceedings of the 3rd European Workshop on Probabilistic Graphical Models (PGM-2006). 91–98. |

[45] | E. Frank, M. Hall, and B. Pfahringer. 2003. Locally weighted naive Bayes. In Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence (UAI-2003). Morgan Kaufmann, 249–256. |

[46] | M. L. Fredman and R. E. Tarjan. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. Journal ACM 34, 3 (1987), 596–615. |

[47] | N. Friedman. 1997. Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the 14th International Conference on Machine Learning (ICML-1997). Morgan Kaufmann, 125–133. |

[48] | N. Friedman, D. Geiger, and M. Goldszmidt. 1997. Bayesian network classifiers. Machine Learning 29 (1997), 131–163. · Zbl 0892.68077 |

[49] | N. Friedman, M. Goldszmidt, and A. Wyner. 1999. Data analysis with Bayesian networks: A bootstrap approach. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-1999). Morgan Kaufmann, 196–205. |

[50] | S. Fu and M. Desmarais. 2007. Local learning algorithm for Markov blanket discovery. In Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI-2007). Lecture Notes in Computer Science, Vol. 4830. Springer, 68–79. |

[51] | A. Fujino, N. Ueda, and K. Saito. 2007. A hybrid generative/discriminative approach to text classification with additional information. Information Processing and Management 43, 2 (2007), 379–392. · Zbl 05135501 |

[52] | J. Gama. 1999. Iterative naïve Bayes. Theoretical Computer Science 292, 2 (1999), 417–430. · Zbl 1026.68071 |

[53] | D. Geiger and D. Heckerman. 1996. Knowledge representation and inference in similarity networks and Bayesian multinets. Artificial Intelligence 82 (1996), 45–74. |

[54] | M. Goldszmidt. 2010. Bayesian network classifiers. In Wiley Encyclopedia of Operations Research and Management Science. John Wiley & Sons, 1–10. |

[55] | I. J. Good. 1965. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. The MIT Press. · Zbl 0168.39603 |

[56] | R. Greiner, X. Su, B. Shen, and W. Zhou. 2005. Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. Machine Learning 59, 3 (2005), 297–322. · Zbl 1101.68759 |

[57] | R. Greiner and W. Zhou. 2002. Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI-2002). AAAI Press/MIT Press, 167–173. |

[58] | D. Grossman and P. Domingos. 2004. Learning Bayesian network classifiers by maximizing conditional likelihood. In Proceedings of the 21st International Conference on Machine Learning (ICML-2004). 361–368. |

[59] | Y. Guo and R. Greiner. 2005. Discriminative model selection for belief net structures. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-2005). AAAI Press /The MIT Press, 770–776. |

[60] | Y. Guo, D. F. Wilkinson, and D. Schuurmans. 2005. Maximum margin Bayesian networks. In Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence (UAI-2005). AUAI Press, 233–242. |

[61] | Y. Gurwicz and B. Lerner. 2006. Bayesian class-matched multinet classifier. In Proceedings of the 2006 Joint IAPR international Conference on Structural, Syntactic, and Statistical Pattern Recognition (SSPR-2006/SPR-2006). Lecture Notes in Computer Science, Vol. 4109. Springer, 145–153. |

[62] | M. A. Hall. 1999. Correlation-Based Feature Selection for Machine Learning. Ph.D. Dissertation. Department of Computer Science, University of Waikato. |

[63] | M. Hall. 2007. A decision tree-based attribute weighting filter for naive Bayes. Knowledge-Based Systems 20, 2 (2007), 120–126. |

[64] | M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations 11, 1 (2009), 10–18. · Zbl 05740105 |

[65] | D. J. Hand and K. Yu. 2001. Idiot’s Bayes - not so stupid after all? International Statistical Review 69, 3 (2001), 385–398. · Zbl 1213.62010 |

[66] | D. Heckerman, D. Geiger, and D. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20 (1995), 197–243. · Zbl 0831.68096 |

[67] | J. Hilden and B. Bjerregaard. 1976. Computer-aided diagnosis and the atypical case. In Decision Making and Medical Care. Can Information Science Help? 365–378. |

[68] | E. R. Hruschka and N. F. F. Ebecken. 2007. Towards efficient variables ordering for Bayesian network classifiers. Data and Knowledge Engineering 63 (2007), 258–269. |

[69] | H. Huang and C. Hsu. 2002. Bayesian classification for data from the same unknown class. IEEE Transactions on Systems, Man, and Cybernetics Part B 32, 2 (2002), 137–145. |

[70] | K. Huang, I. King, and M. R. Lyu. 2003. Discriminative training of Bayesian Chow-Liu multinet classifiers. In Proceedings of the International Joint Conference on Neural Networks (IJCNN-2003), Vol. 1. 484–488. |

[71] | A. Hussein and E. Santos. 2004. Exploring case-based Bayesian networks and Bayesian multi-nets for classification. In Proceedings of the 17th Conference of the Canadian Society for Computational Studies of Intelligence (CSCSI-2004). Lecture Notes in Computer Science, Vol. 3060. Springer, 485–492. |

[72] | K.-B. Hwang and B. T. Zhang. 2005. Bayesian model averaging of Bayesian network classifiers over multiple node-orders: Application to sparse datasets. IEEE Transactions on Systems, Man, and Cybernetics. Part B: Cybernetics 35, 6 (2005), 1302–1310. |

[73] | A. Ibáñez, P. Larrañaga, and C. Bielza. 2014. Cost-sensitive selective naive Bayes classifiers for predicting the increase of the h-index for scientific journals. Neurocomputing in press (2014). |

[74] | I. Inza, P. Larrañaga, R. Blanco, and A. J. Cerrolaza. 2004. Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine 31, 2 (2004), 91–103. |

[75] | I. Inza, P. Larrañaga, R. Etxeberria, and B. Sierra. 2000. Feature subset selection by Bayesian network-based optimization. Artificial Intelligence 123, 1–2 (2000), 157–184. · Zbl 0952.68118 |

[76] | A. G. Ivakhnenko. 1970. Heuristic self-organization in problems of engineering cybernetics. Automatica 6, 2 (1970), 207–219. |

[77] | N. Japkowicz and S. Mohak. 2011. Evaluating Learning Algorithms. A Classification Perspective. Cambridge University Press. · Zbl 1230.68020 |

[78] | T. Jebara. 2004. Machine Learning: Discriminative and Generative. Kluwer Academic Publishers. · Zbl 1030.68073 |

[79] | L. Jiang, Z. Cai, D. Wang, and H. Zhang. 2012. Improving tree augmented Naive Bayes for class probability estimation. Knowledge-Based Systems 26 (2012), 239–245. |

[80] | L. Jiang and H. Zhang. 2006. Lazy averaged one-dependence estimators. In Proceedings of the 19th Canadian Conference on AI (Canadian AI-2006). Lecture Notes in Computer Science, Vol. 4013. Springer, 515–525. |

[81] | L. Jiang, H. Zhang, and Z. Cai. 2009. A novel Bayes model: Hidden naive Bayes. IEEE Transactions on Knowledge and Data Engineering 21, 10 (2009), 1361–1371. |

[82] | L. Jiang, H. Zhang, Z. Cai, and D. Wang. 2012. Weighted average of one-dependence estimators. Journal of Experimental and Theoretical Artificial Intelligence 24, 2 (2012), 219–230. |

[83] | Y. Jing, V. Pavlovic, and J. M. Rehg. 2008. Boosted Bayesian network classifiers. Machine Learning 73 (2008), 155–184. · Zbl 05537387 |

[84] | C. Kang and J. Tian. 2006. A Hybrid generative/discriminative Bayesian classifier. In Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference (FLAIRS-2006). AAAI Press, 562–567. |

[85] | E. J. Keogh and M. J. Pazzani. 2002. Learning the structure of augmented Bayesian classifiers. International Journal on Artificial Intelligence Tools 11, 4 (2002), 587–601. · Zbl 05421426 |

[86] | M. A. Kłopotek. 2005. Very large Bayesian multinets for text classification. Future Generation Computer Systems 21, 7 (2005), 1068–1082. |

[87] | R. Kohavi. 1996. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-1996). 202–207. |

[88] | R. Kohavi, B. Becker, and D. Sommerfield. 1997. Improving Simple Bayes. Technical Report. Data Mining and Visualization Group, Silicon Graphics. |

[89] | R. Kohavi and G. H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97, 1 (1997), 273–324. · Zbl 0904.68143 |

[90] | D. Koller and M. Sahami. 1996. Toward optimal feature selection. In Proceedings of the 13th International Conference on Machine Learning (ICML-1996). 284–292. |

[91] | I. Kononenko. 1993. Successive naive Bayesian classifier. Informatica (Slovenia) 17, 2 (1993), 167–174. |

[92] | P. Kontkanen, P. Myllymäki, T. Silander, and H. Tirri. 1998. BAYDA: Software for Bayesian classification and feature selection. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-1998). AAAI Press, 254–258. |

[93] | P. Kontkanen, P. Myllymäki, and H. Tirri. 1996. Constructing Bayesian Finite Mixture Models by the EM Algorithm. Technical Report C-1996-9. Department of Computer Science, University of Helsinki. |

[94] | J. B. Kruskal. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society 7 (1956), 48–50. · Zbl 0070.18404 |

[95] | C. K. Kwoh and D. Gillies. 1996. Using hidden nodes in Bayesian networks. Artificial Intelligence 88 (1996), 1–38. · Zbl 0906.68115 |

[96] | P. Langley. 1993. Induction of recursive Bayesian classifiers. In Proceedings of the 8th European Conference on Machine Learning (ECML-1993). 153–164. |

[97] | P. Langley and S. Sage. 1994. Induction of selective Bayesian classifiers. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence (UAI-1994). Morgan Kaufmann, 399–406. |

[98] | H. Langseth and T. D. Nielsen. 2006. Classification using hierarchical naïve Bayes models. Machine Learning 63, 2 (2006), 135–159. · Zbl 1110.68130 |

[99] | J. Li, C. Zhang, T. Wang, and Y. Zhang. 2007. Generalized additive Bayesian network classifiers. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-2007). 913–918. |

[100] | J. N. K. Liu, N. L. Li, and T. S. Dillon. 2001. An improved naïve Bayes classifier technique coupled with a novel input solution method. IEEE Transactions on Systems, Man, and Cybernetics 31 (2001), 249–256. |

[101] | D. J. Lizotte, O. Madani, and R. Greiner. 2003. Budgeted learning of naive-Bayes classifiers. In Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence (UAI-2003). Morgan Kaufmann, 378–385. |

[102] | F. Louzada and A. Ara. 2012. Bagging k-dependence probabilistic networks: An alternative powerful fraud detection tool. Expert Systems with Applications 39, 14 (2012), 11583–11592. |

[103] | P. Lucas. 2004. Restricted Bayesian network structure learning. In Advances in Bayesian Networks. Springer, 217–232. |

[104] | S.-C. Ma and H.-B. Shi. 2004. Tree-augmented naive Bayes ensembles. In Proceedings of the 3rd International Conference on Machine Learning and Cybernetics. IEEE, 1497–1502. |

[105] | M. G. Madden. 2009. On the classification performance of TAN and general Bayesian networks. Knowledge-Based Systems 22, 7 (2009), 489–495. |

[106] | M. G. Madden. 2002. A new Bayesian network structure for classification tasks. In Proceedings of the 13th Irish Conference on Artificial Intelligence and Cognitive Science. 203–208. · Zbl 1018.68746 |

[107] | D. Margaritis and S. Thrun. 2000. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12 (NIPS-1999). MIT Press, 505–511. |

[108] | M. Maron and J. Kuhns. 1960. On relevance, probabilistic indexing, and information retrieval. Journal of the Association for Computing Machinery 7 (1960), 216–244. |

[109] | W. J. McGill. 1954. Multivariate information transmission. Psychometrika 19 (1954), 97–116. · Zbl 0058.35706 |

[110] | R. S. Michalski, I. Mozetic, J. Hong, and N Lavrac. 1986. The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the 5th National Conference on Artificial Intelligence. Morgan Kaufman, 1041–1045. |

[111] | M. Minsky. 1961. Steps toward artificial intelligence. Transactions on Institute of Radio Engineers 49 (1961), 8–30. |

[112] | D. Mladenic and M. Grobelnik. 1999. Feature selection for unbalanced class distribution and naive Bayes. In Proceedings of the 16th International Conference on Machine Learning (ICML-1999). Morgan Kaufmann, 258–267. |

[113] | S. Monti and G. F. Cooper. 1999. A Bayesian network classifier that combines a finite mixture model and a naïve Bayes model. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-1999). 447–456. |

[114] | M. Možina, J. Demšar, M. Kattan, and B. Zupan. 2004. Nomograms for visualization of naive Bayesian classifier. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2004). 337–348. |

[115] | J. F. Murray, G. F. Hughes, and K. Kreutz-Delgado. 2005. Machine learning methods for predicting failures in hard drives: A multiple-instance application. Journal of Machine Learning Research 6 (2005), 783–816. · Zbl 1222.68272 |

[116] | M. Narasimhan and J. A. Bilmes. 2005. A submodular-supermodular procedure with applications to discriminative structure learning. In Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence (UAI-2005). AUAI Press, 404–412. |

[117] | A. Ng and M. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naïve Bayes. In Advances in Neural Information Processing Systems 14 (NIPS-2001). MIT Press, 841–848. |

[118] | G. N. Norén and R. Orre. 2005. Case based imprecision estimates for Bayes classifiers with the Bayesian bootstrap. Machine Learning 58, 1 (2005), 79–94. · Zbl 1075.68076 |

[119] | M. Pazzani. 1996. Constructive induction of Cartesian product attributes. In Proceedings of the Information, Statistics and Induction in Science Conference (ISIS-1996). 66–77. |

[120] | M. Pazzani and D. Billsus. 1997. Learning and revising user profiles: the identification of interesting web sites. Machine Learning 27 (1997), 313–331. |

[121] | J. Pearl. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, Palo Alto, CA. · Zbl 0746.68089 |

[122] | J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. International Journal of Approximate Reasoning 45, 2 (2007), 211–232. |

[123] | F. Pernkopf. 2005. Bayesian network classifiers versus selectivek-NN classifier. Pattern Recognition 38 (2005), 1–10. · Zbl 1101.68826 |

[124] | F. Pernkopf and J. A. Bilmes. 2005. Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In Proceedings of the 22nd International Conference on Machine Learning (ICML-2005). ACM, 657–664. |

[125] | F. Pernkopf and J. A. Bilmes. 2010. Efficient heuristics for discriminative structure learning of Bayesian network classifiers. Journal of Machine Learning Research 11 (2010), 2323–2360. · Zbl 1242.68294 |

[126] | F. Pernkopf and P. O’Leary. 2003. Floating search algorithm for structure learning of Bayesian network classifiers. Pattern Recognition Letters 24 (2003), 2839–2848. · Zbl 1073.68787 |

[127] | F. Pernkopf and M. Wohlmayr. 2009. On discriminative parameter learning of Bayesian network classifiers. In Proceedings of the 20th European Conference on Machine Learning (ECML-2009). Lecture Notes in Computer Science, Vol. 5782. Springer, 221–237. · Zbl 1295.68187 |

[128] | F. Pernkopf and M. Wohlmayr. 2013. Stochastic margin-based structure learning of Bayesian network classifiers. Pattern Recognition 46, 2 (2013), 464–471. · Zbl 1295.68187 |

[129] | F. Pernkopf, M. Wohlmayr, and S. Tschiatschek. 2012. Maximum margin Bayesian network classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 3 (2012), 521–532. |

[130] | T. V. Pham, M. Worring, and A. W. M. Smeulders. 2002. Face detection by aggregated Bayesian network classifiers. Pattern Recognition Letters 23, 4 (2002), 451–461. · Zbl 1006.68116 |

[131] | B. Poulin, R. Eisner, D. Szafron, Paul Lu, R. Greiner, D. S. Wishart, A. Fyshe, B. Pearcy, C. MacDonell, and J. Anvik. 2006. Visual explanation of evidence with additive classifiers. In Proceedings of the 21th National Conference on Artificial Intelligence (AAAI-2006). AAAI Press/MIT Press, 1822–1829. |

[132] | A. Prinzie and D. Van den Poel. 2007. Random multiclass classification: Generalizing random forests to random MNL and random NB. In Proceedings of the Database and Expert Systems Applications. Lecture Notes in Computer Science. Vol. 4653. Springer, 349–358. |

[133] | G. M. Provan and M. Singh. 1995. Learning Bayesian networks using feature selection. In Proceedings of the 5th International Workshop on Artificial Intelligence and Statistics (AISTATS-1995). 450–456. |

[134] | R. Raina, Y. Shen, A. Y. Ng, and A. McCallum. 2004. Classification with hybrid generative/discriminative models. In Advances in Neural Information Processing Systems 16 (NIPS-2003). The MIT Press. |

[135] | M. Ramoni and P. Sebastiani. 2001a. Robust Bayes classifiers. Artificial Intelligence 125 (2001), 209–226. · Zbl 0969.68148 |

[136] | M. Ramoni and P. Sebastiani. 2001b. Robust learning with missing data. Machine Learning 45, 2 (2001), 147–170. · Zbl 1007.68154 |

[137] | C. A. Ratanamahatana and D. Gunopulos. 2003. Feature selection for the naive Bayesian classifier using decision trees. Applied Artificial Intelligence 17, 5–6 (2003), 475–487. |

[138] | S. Renooij and L. C. van der Gaag. 2008. Evidence and scenario sensitivities in naive Bayesian classifiers. International Journal of Approximate Reasoning 49, 2 (2008), 398–416. · Zbl 1184.62038 |

[139] | G. Ridgeway, D. Madigan, and T. Richardson. 1998. Interpretable boosted naïve Bayes classification. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-1998). 101–104. |

[140] | V. Robles, P. Larrañaga, J. M. Peña, E. Menasalvas, and M. S. Pérez. 2003. Interval estimation naive Bayes. In Proceedings of the 5th International Symposium on Intelligent Data Analysis (IDA-2003). Lecture Notes in Computer Science, Vol. 2810. Springer, 143–154. |

[141] | V. Robles, P. Larrañaga, J. M. Peña, E. Menasalvas, M. S. Pérez, and V. Herves. 2004. Bayesian networks as consensed voting system in the construction of a multi-classiffier for protein secondary structure prediction. Artificial Intelligence in Medicine 31 (2004), 117–136. |

[142] | V. Robles, P. Larrañaga, J. M. Peña, M. S. Pérez, E. Menasalvas, and V. Herves. 2003. Learning semi naive Bayes structures by estimation of distribution algorithms. In Proceedings of the 11th Portuguese Conference on Artificial Intelligence (EPIA-2003). Lecture Notes in Computer Science. 244–258. · Zbl 1205.68314 |

[143] | S. Rodrigues de Morais and A. Aussem. 2010. A novel Markov boundary based feature subset selection algorithm. Neurocomputing 73, 4–6 (2010), 578–584. |

[144] | J. J. Rodríguez and L. I. Kuncheva. 2007. Naïve Bayes ensembles with a random oracle. In Proceedings of the 7th International Workshop on Multiple Classifier Systems (MCS-2007). Lecture Notes in Computer Science, Vol. 4472. Springer, 450–458. |

[145] | T. Roos, H. Wettig, P. Grünwald, P. Myllymäki, and H. Tirri. 2005. On discriminative Bayesian network classifiers and logistic regression. Machine Learning 59, 3 (2005), 267–296. · Zbl 1101.68785 |

[146] | G. A. Ruz and D. T. Pham. 2009. Building Bayesian networks classifiers thorugh a Bayesian monitoring system. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 223 (2009), 743–755. |

[147] | Y. Saeys, I. Inza, and P. Larrañaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507–2517. |

[148] | M. Sahami. 1996. Learning limited dependence Bayesian classifiers. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-1996). 335–338. |

[149] | G. Santafé, J. A. Lozano, and P. Larrañaga. 2005. Discriminative learning of Bayesian network classifiers via the TM algorithm. In Proceedings of the 8th European Conference in Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2005). Lecture Notes in Artificial Intelligence, Vol. 3571. Springer, 148–160. · Zbl 1122.68686 |

[150] | B. Sierra and P. Larrañaga. 1998. Predicting the survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artificial Intelligence in Medicine 14 (1998), 215–230. |

[151] | B. Sierra, E. Lazkano, E. Jauregi, and I. Irigoien. 2009. Histogram distance-based Bayesian network structure learning: A supervised classification specific approach. Decision Support Systems 48, 1 (2009), 180–190. · Zbl 05871901 |

[152] | B. Sierra, N. Serrano, P. Larrañaga, E. J. Plasencia, I. Inza, J. J. Jiménez, P. Revuelta, and M. L. Mora. 2001. Using Bayesian networks in the construction of a bi-level multi-classifier. A case study using intensive care unit patient data. Artificial Intelligence in Medicine 22 (2001), 233–248. |

[153] | M. Singh and G. Provan. 1996. Efficient learning of selective Bayesian network classifiers. In Proceedings of the 13th International Conference on Machine Learning (ICML-1996). 453–461. |

[154] | M. Singh and M. Valtorta. 1995. Construction of Bayesian network structures from data: A brief survey and an efficient algorithm. International Journal of Approximate Reasoning 12, 2 (1995), 111–131. · Zbl 0814.68115 |

[155] | P. Spirtes, C. Glymour, and R. Scheines. 1993. Causation, Prediction, and Search. |

[156] | J. Su, H. Zhang, C. X. Ling, and S. Matwin. 2008. Discriminative parameter learning for Bayesian networks. In Proceedings of the 25th International Conference on Machine Learning (ICML-2008), Vol. 307. ACM, 1016–1023. |

[157] | J.-N. Sulzmann, J. Fürnkranz, and E. Hüllermeier. 2007. On pairwise naive Bayes classifiers. In Proceedings of the 18th European Conference on Machine Learning (ECML-2007). Lecture Notes in Computer Science, Vol. 4701. Springer, 371–381. |

[158] | D. M. Titterington, G. D. Murray, L. S. Spiegelhalter, A. M. Skene, J. D. F. Habbema, and G. J. Gelpke. 1981. Comparison of discrimination techniques applied to a complex data set of head injured patients (with discussion). Journal of the Royal Statistical Society Series A 144, 2 (1981), 145–175. · Zbl 0469.62085 |

[159] | I. Tsamardinos and C. F. Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (AISTATS-2003). |

[160] | I. Tsamardinos, C. F. Aliferis, and A. R. Statnikov. 2003a. Algorithms for large scale Markov blanket discovery. In Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference (FLAIRS-2003). AAAI Press, 376–381. |

[161] | I. Tsamardinos, C. F. Aliferis, and A. R. Statnikov. 2003b. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003). 673–678. |

[162] | M. van Gerven and P. J. F. Lucas. 2004. Employing maximum mutual information for Bayesian classification. In Proceedings of the 5th International Symposium on Biological and Medical Data Analysis (ISBMDA-2004). Lecture Notes in Computer Science, Vol. 3337. Springer, 188–199. |

[163] | T. Verma and J. Pearl. 1990. Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence (UAI-1990). Elsevier, 255–270. |

[164] | D. Vidaurre, C. Bielza, and P. Larrañaga. 2012. Forward stagewise naive Bayes. Progress in Artificial Intelligence 1 (2012), 57–69. |

[165] | R. Vilalta and I. Rish. 2003. A decomposition of classes via clustering to explain and improve naive Bayes. In Proceedings of the 14th European Conference on Machine Learning (ECML-2003). Lecture Notes in Computer Science, Vol. 2837. Springer, 444–455. |

[166] | G. I. Webb, J. Boughton, and Z. Wang. 2005. Not so naive Bayes: Aggregating one-dependence estimators. Machine Learning 58 (2005), 5–24. · Zbl 1075.68078 |

[167] | G. I. Webb and M. J. Pazzani. 1998. Adjusted probability naïve Bayesian induction. In Proceedings of the 11th Australian Joint Conference on Artificial Intelligence (AI-1998). Lecture Notes in Computer Science, Vol. 1502. Springer. |

[168] | T.-T. Wong. 2009. Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Mining and Knowledge Discovery 18, 2 (2009), 183–213. · Zbl 05659235 |

[169] | J. Xiao, C. He, and X. Jiang. 2009. Structure identification of Bayesian classifiers based on GMDH. Knowledge-Based Systems 22 (2009), 461–470. |

[170] | J.-H. Xue and D. M. Titterington. 2010. Joint discriminative-generative modelling based on statistical tests for classification. Pattern Recognition Letters 31, 9 (2010), 1048–1055. |

[171] | Y. Yang, K. B. Korb, K. M. Ting, and G. I. Webb. 2005. Ensemble selection for superparent-one-dependence estimators. In Proceedings of the 18th Australian Conference on Artificial Intelligence. 102–112. · Zbl 1151.68599 |

[172] | Y. Yang, G. I. Webb, J. Cerquides, K. B. Korb, J. Boughton, and K. M. Ting. 2007. To select or to weigh: A comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Transactions on Knowledge and Data Engineering 19 (2007), 1652–1665. · Zbl 05340436 |

[173] | S. Yaramakala and D. Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM-2005). IEEE Computer Society, 809–812. |

[174] | M. Zaffalon. 2002. The naïve credal classifier. Journal of Statistical Planning and Inference 105, 1 (2002), 5–21. · Zbl 0992.62057 |

[175] | M. Zaffalon and E. Fagiuoli. 2003. Tree-based credal networks for classification. Reliable Computing 9, 6 (2003), 487–509. · Zbl 1038.62030 |

[176] | H. Zhang and S. Sheng. 2004. Learning weighted naive Bayes with accurate ranking. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM-2005). IEEE Computer Society, 567–570. |

[177] | H. Zhang and J. Su. 2008. Naive Bayes for optimal ranking. Journal of Experimental & Theoretical Artificial Intelligence 20, 2 (2008), 79–93. · Zbl 1147.68665 |

[178] | N. L. Zhang, T. D. Nielsen, and F. V. Jensen. 2004. Latent variable discovery in classification models. Artificial Intelligence in Medicine 30, 3 (2004), 283–299. · Zbl 05391210 |

[179] | F. Zheng and G. I. Webb. 2006. Efficient lazy elimination for averaged one-dependence estimators. In Proceedings of the 23rd International Conference on Machine Learning (ICML-2006), Vol. 148. ACM, 1113–1120. |

[180] | Z. Zheng. 1998. Naïve Bayesian classifier committees. In Proceedings of the 10th European Conference on Machine Learning (ECML-1998). Lecture Notes in Computer Science, Vol. 1398. Springer, 196–207. |

[181] | Z. Zheng and G. I. Webb. 2000. Lazy learning of Bayesian rules. Machine Learning 41 (2000), 53–84. · Zbl 02180966 |

[182] | B. Ziebart, A. K. Dey, and J. A. Bagnell. 2007. Learning selectively conditioned forest structures with applications to DBNs and classification. In Proceedings of the 23rd Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-2007). AUAI Press, 458–465. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.