Parametric classification with soft labels using the evidential EM algorithm: linear discriminant analysis versus logistic regression. (English) Zbl 1414.62265

Summary: Partially supervised learning extends both supervised and unsupervised learning, by considering situations in which only partial information about the response variable is available. In this paper, we consider partially supervised classification and we assume the learning instances to be labeled by Dempster-Shafer mass functions, called soft labels. Linear discriminant analysis and logistic regression are considered as special cases of generative and discriminative parametric models. We show that the evidential EM algorithm can be particularized to fit the parameters in each of these models. We describe experimental results with simulated data sets as well as with two real applications: K-complex detection in sleep EEGs signals and facial expression recognition. These results confirm the interest of using soft labels for classification as compared to potentially erroneous crisp labels, when the true class membership is partially unknown or ill-defined.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F86 Parametric inference and fuzziness
68T10 Pattern recognition, speech recognition
68T37 Reasoning under uncertainty in the context of artificial intelligence
Full Text: DOI HAL


[1] Abassi L, Boukhris I (2016) Crowd label aggregation under a belief function framework. In: Lehner F, Fteimi N (eds) Proceedings of 9th international conference on knowledge science, engineering and management, KSEM 2016, Passau, Germany, 5-7 Oct 2016. Springer, Cham, pp 185-196
[2] Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin · Zbl 1107.68072
[3] Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge
[4] Cherfi, ZL; Oukhellou, L.; Côme, E.; Denœux, T.; Aknin, P., Partially supervised independent factor analysis using soft labels elicited from multiple experts: application to railway track circuit diagnosis, Soft Comput, 16, 741-754, (2012)
[5] Côme, E.; Oukhellou, L.; Denœux, T.; Aknin, P., Learning from partially supervised data using mixture models and belief functions, Patt Recognit, 42, 334-348, (2009) · Zbl 1181.68231
[6] Cour, T.; Sapp, B.; Taskar, B., Learning from partial labels, J Mach Learn Res, 12, 1225-1261, (2011) · Zbl 1280.68162
[7] Couso, I.; Dubois, D.; Ferraro, MB (ed.); Giordani, P. (ed.); Vantaggi, B. (ed.); Gagolewski, M. (ed.); Gil, M. Ángeles (ed.); Grzegorzewski, P. (ed.); Hryniewicz, O. (ed.), Maximum likelihood under incomplete information: toward a comparison of criteria, 141-148, (2017), Cham
[8] Dempster, AP, Upper and lower probabilities induced by a multivalued mapping, Ann Math Stat, 38, 325-339, (1967) · Zbl 0168.17501
[9] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc B, 39, 1-38, (1977) · Zbl 0364.62022
[10] Denœux, T., A \(k\)-nearest neighbor classification rule based on Dempster-Shafer theory, IEEE Trans Syst Man Cybern, 25, 804-813, (1995)
[11] Denœux, T., Maximum likelihood estimation from uncertain data in the belief function framework, IEEE Trans Knowl Data Eng, 25, 119-130, (2013)
[12] Denœux, T., Likelihood-based belief function: justification and some extensions to low-quality data, Int J Approx Reason, 55, 1535-1547, (2014) · Zbl 1407.62097
[13] Denoeux T, Kanjanatarakul O (2016) Beyond fuzzy, possibilistic and rough: an investigation of belief functions in clustering. In: Proceedings of the 8th international conference on soft methods in probability and statistics SMPS 2016, soft methods for data science, advances in intelligent and soft computing, AISC, vol 456. Springer, Rome, Italy, pp 157-164
[14] Denœux, T.; Masson, MH, EVCLUS: evidential clustering of proximity data, IEEE Trans Syst Man Cybern B, 34, 95-109, (2004)
[15] Denœux T, Skarstein-Bjanger M (2000) Induction of decision trees for partially classified data. In: Proceedings of SMC’2000. IEEE, Nashville, TN, pp 2923-2928
[16] Denœux, T.; Zouhal, LM, Handling possibilistic labels in pattern classification using evidential reasoning, Fuzzy Sets Syst, 122, 47-62, (2001) · Zbl 1063.68635
[17] Denœux, T.; Sriboonchitta, S.; Kanjanatarakul, O., Evidential clustering of large dissimilarity data, Knowl Based Syst, 106, 179-195, (2016) · Zbl 1352.68247
[18] Dubuisson, S.; Davoine, F.; Masson, MH, A solution for facial expression representation and recognition, Signal Process Image Commun, 17, 657-673, (2002)
[19] Elouedi, Z.; Mellouli, K.; Smets, P., Belief decision trees: theoretical foundations, Int J Approx Reason, 28, 91-124, (2001) · Zbl 0991.68088
[20] Hasan, A.; Wang, Z.; Mahani, A., Fast estimation of multinomial logit models: R package mnlogit, J Stat Softw, 75, 1-24, (2016)
[21] Heitjan, DF; Rubin, DB, Ignorability and coarse data, Ann Stat, 19, 2244-2253, (1991) · Zbl 0745.62004
[22] Hüllermeier, E., Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization, Int J Approx Reason, 55, 1519-1534, (2014) · Zbl 1407.68396
[23] Hüllermeier E, Beringer J (2005) Learning from ambiguously labeled examples. In: Proceedings of the 6th international symposium on intelligent data analysis (IDA-05), Madrid, Spain · Zbl 1141.68567
[24] Jaffray, JY, Linear utility theory for belief functions, Oper Res Lett, 8, 107-112, (1989) · Zbl 0673.90010
[25] Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the fourth international conference of face and gesture recognition, Grenoble, France, pp 46-53
[26] Li J (2013) Logistic regression. Course notes. http://sites.stat.psu.edu/ jiali/course/stat597e/notes2/logit.pdf
[27] Liu, ZG; Pan, Q.; Dezert, J.; Mercier, G., Credal c-means clustering method based on belief functions, Knowl Based Syst, 74, 119-132, (2015)
[28] Liu ZG, Pan Q, Dezert J, Mercier G (2017) Hybrid classification system for uncertain data. IEEE Trans Syst Man Cybern Syst (in press). https://doi.org/10.1109/TSMC.2016.2622247
[29] Ma, L.; Destercke, S.; Wang, Y., Online active learning of decision trees with evidential data, Patt Recognit, 52, 33-45, (2016) · Zbl 1394.68302
[30] Mardia, KV, Measures of multivariate skewness and kurtosis with applications, Biometrika, 57, 519-530, (1970) · Zbl 0214.46302
[31] McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York · Zbl 0882.62012
[32] McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061
[33] Nguyen N, Caruana R (2008) Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08. ACM, New York, NY, USA, pp 551-559
[34] Peters, G.; Crespo, F.; Lingras, P.; Weber, R., Soft clustering: fuzzy and rough approaches and their extensions and derivatives, Int J Approx Reason, 54, 307-322, (2013)
[35] Press, SJ; Wilson, S., Choosing between logistic regression and discriminant analysis, J Am Stat Assoc, 73, 699-705, (1978) · Zbl 0399.62060
[36] Quost B (2014) Logistic regression of soft labeled instances via the evidential EM algorithm. In: Cuzzolin F (ed) Proceedings of the third international conference on belief functions: theory and applications, BELIEF 2014. Oxford, UK, 26-28 Sept 2014. Springer, Cham, pp 77-86
[37] Quost, B.; Denoeux, T., Clustering and classification of fuzzy data using the fuzzy EM algorithm, Fuzzy Sets Syst, 286, 134-156, (2016) · Zbl 06840610
[38] Ramasso, E.; Denœux, T., Making use of partial knowledge about hidden states in HMMs: an approach based on belief functions, IEEE Trans Fuzzy Syst, 21, 1-11, (2013)
[39] Richard C (1998) Une méthodologie pour la détection à structure imposée. applications au plan temps-fréquence. Ph.D. thesis, Université de Technologie de Compiègne
[40] Richard, C.; Lengellé, R., Data driven design and complexity control of time-frequency detectors, Sig Process, 77, 37-48, (1999) · Zbl 1026.94514
[41] Rjab AB, Kharoune M, Miklos Z, Martin A (2016) Characterization of experts in crowdsourcing platforms. In: Vejnarová J, Kratochvíl V (eds) Proceedings of 4th international conference on belief functions: theory and applications, BELIEF 2016, Prague, Czech Republic, 21-23 Sept 2016. Springer, Cham, pp 97-104
[42] Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton · Zbl 0359.62002
[43] Strat, TM, Decision analysis using belief functions, Int J Approx Reason, 4, 391-417, (1990) · Zbl 0706.68089
[44] Sutton-Charani N, Destercke S, Denoeux T (2013) Learning decision trees from uncertain data with an evidential EM approach. In: 12th international conference on machine learning and applications, 2013, vol 1, pp 111-116
[45] Sutton-Charani N, Destercke S, Denœux T (2014) Training and evaluating classifiers from evidential data: application to E2M decision tree pruning. In: Cuzzolin F (ed) Proceedings of the third international conference on belief functions: theory and applications, BELIEF 2014. Oxford, UK, 26-28 Sept 2014. Springer, Cham, pp 87-94
[46] Trabelsi, S.; Elouedi, Z.; Mellouli, K., Pruning belief decision tree methods in averaging and conjunctive approaches, Int J Approx Reason, 46, 568-595, (2007) · Zbl 1185.68718
[47] Zadeh, LA, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets Syst, 1, 3-28, (1978) · Zbl 0377.04002
[48] Zhou K, Martin A, Pan Q (2014) Evidential-EM algorithm applied to progressively censored observations. In: Laurent A, Strauss O, Bouchon-Meunier B, Yager RR (eds) Proceedings of 15th international conference on information processing and management of uncertainty in knowledge-based systems, IPMU 2014, Montpellier, France, Part III, 15-19 July 2014. Springer, Cham, pp 180-189
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.