Marginal and simultaneous predictive classification using stratified graphical models. (English) Zbl 1414.62262

Summary: An inductive probabilistic classification rule must generally obey the principles of Bayesian predictive inference, such that all observed and unobserved stochastic quantities are jointly modeled and the parameter uncertainty is fully acknowledged through the posterior predictive distribution. Several such rules have been recently considered and their asymptotic behavior has been characterized under the assumption that the observed features or variables used for building a classifier are conditionally independent given a simultaneous labeling of both the training samples and those from an unknown origin. Here we extend the theoretical results to predictive classifiers acknowledging feature dependencies either through graphical models or sparser alternatives defined as stratified graphical models. We show through experimentation with both synthetic and real data that the predictive classifiers encoding dependencies have the potential to substantially improve classification accuracy compared with both standard discriminative classifiers and the predictive classifiers based on solely conditionally independent features. In most of our experiments stratified graphical models show an advantage over ordinary graphical models.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
Full Text: DOI arXiv


[1] Bishop CM (2007) Pattern recognition and machine learning. Springer, New York · Zbl 1107.68072
[2] Cerquides, J.; Mántaras, RL, TAN classifiers based on decomposable distributions, Mach Learn, 59, 323-354, (2005) · Zbl 1105.68091
[3] Cooper, GF; Herskovits, E., A Bayesian method for the induction of probabilistic networks from data, Mach Learn, 9, 309-347, (1992) · Zbl 0766.68109
[4] Corander, J.; Marttinen, P., Bayesian identification of admixture events using multi-locus molecular markers, Mol Ecol, 15, 2833-2843, (2006)
[5] Corander, J.; Marttinen, P.; Sirén, J.; Tang, J., Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations, BMC Bioinform, 9, 539, (2008)
[6] Corander J, Cui Y, Koski T (2013a) Inductive inference and partition exchangeability in classification. In: Dowe DL (ed) Solomonoff Festschrift, Springer Lecture Notes in Artificial Intelligence (LNAI), vol 7070, pp 91-105 · Zbl 1403.68183
[7] Corander, J.; Cui, Y.; Koski, T.; Sirén, J., Have I seen you before? Principles of Bayesian predictive classification revisited, Stat Comput, 23, 59-73, (2013) · Zbl 1322.62164
[8] Corander, J.; Xiong, J.; Cui, Y.; Koski, T., Optimal Viterbi Bayesian predictive classification for data from finite alphabets, J Stat Plan Infer, 143, 261-275, (2013) · Zbl 1254.62007
[9] Dawid, A.; Lauritzen, S., Hyper-Markov laws in the statistical analysis of decomposable graphical models, Ann Stat, 21, 1272-1317, (1993) · Zbl 0815.62038
[10] Dawyndt, P.; Thompson, FL; Austin, B.; Swings, J.; Koski, T.; Gyllenberg, M., Application of sliding-window discretization and minimization of stochastic complexity for the analysis of fAFLP genotyping fingerprint patterns of Vibrionaceae, Int J Syst Evol Microbiol, 55, 57-66, (2005)
[11] Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York · Zbl 0968.68140
[12] Friedman, N.; Geiger, D.; Goldszmidt, M., Bayesian network classifiers, Mach Learn, 29, 131-163, (1997) · Zbl 0892.68077
[13] Geisser, S., Posterior odds for multivariate normal classifications, J R Stat Soc B, 26, 69-76, (1964) · Zbl 0126.34302
[14] Geisser, S.; Krishnajah, PR (ed.), Predictive discrimination, (1966), New York
[15] Geisser S (1993) Predictive inference: an introduction. Chapman & Hall, London · Zbl 0824.62001
[16] Golumbic MC (2004) Algorithmic graph theory and perfect graphs, 2nd edn. Elsevier, Amsterdam · Zbl 1050.05002
[17] Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York · Zbl 1273.62005
[18] Helsingin Sanomat (2011) HS:n vaalikone 2011. http://www2.hs.fi/extrat/hsnext/HS-vaalikone2011.xls, visited 15 Oct 2013
[19] Holmes DE, Jain LC (2008) Innovations in Bayesian networks: theory and applications, vol 156. Springer, Berlin · Zbl 1227.68006
[20] Huo, Q.; Lee, CH, A Bayesian predictive classification approach to robust speech recognition, IEEE Trans Speech Audio Process, 8, 200-204, (2000)
[21] Keogh EJ, Pazzani MJ (1999) Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225-230
[22] Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press, London · Zbl 1183.68483
[23] Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford · Zbl 0907.62001
[24] Madden, MG, On the classification performance of TAN and general Bayesian networks, Knowl Based Syst, 22, 489-495, (2009)
[25] Maina, CW; Walsh, JM, Joint speech enhancement and speaker identification using approximate Bayesian inference, IEEE Trans Audio Speech Lang Process, 19, 1517-1529, (2011)
[26] Nádas, A., Optimal solution of a training problem in speech recognition, IEEE Trans Acoustics Speech Signal Process, 33, 326-329, (1985)
[27] Nyman H, Pensar J, Koski T, Corander J (2014) Stratified graphical models—context-specific independence in graphical models. Bayesian Anal 9(4):883-908 · Zbl 1327.62030
[28] Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In: Proceedings of the 22nd international conference on machine learning, pp 657-664
[29] Ripley BD (1988) Statistical inference for spatial processes. Cambridge University Press, Cambridge · Zbl 0716.62100
[30] Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge · Zbl 0853.62046
[31] Su J, Zhang H (2006) Full Bayesian network classifiers. In: Proceedings of the 23rd international conference on machine learning, pp 897-904
[32] Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester · Zbl 0732.62056
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.