Robust supervised classification with mixture models: learning from data with uncertain labels. (English) Zbl 1175.68313

Summary: In the supervised classification framework, human supervision is required for labeling a set of learning data which are then used for building the classifier. However, in many applications, human supervision is either imprecise, difficult or expensive. In this paper, the problem of learning a supervised multi-class classifier from data with uncertain labels is considered and a model-based classification method is proposed to solve it. The idea of the proposed method is to confront an unsupervised modeling of the data with the supervised information carried by the labels of the learning data in order to detect inconsistencies. The method is able afterward to build a robust classifier taking into account the detected inconsistencies into the labels. Experiments on artificial and real data are provided to highlight the main features of the proposed method as well as an application to object recognition under weak supervision.


68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition


Full Text: DOI Link


[1] Banfield, J.; Raftery, A., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[2] Bashir, S.; Carter, E., High breakdown mixture discriminant analysis, Journal of multivariate analysis, 93, 1, 102-111, (2005) · Zbl 1087.62076
[3] Bellman, R., Dynamic programming, (1957), Princeton University Press Princeton, NJ · Zbl 0077.13605
[4] Bensmail, H.; Celeux, G., Regularized Gaussian discriminant analysis through eigenvalue decomposition, Journal of the American statistical association, 91, 1743-1748, (1996) · Zbl 0885.62068
[5] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional data clustering, Computational statistics and data analysis, 52, 1, 502-519, (2007) · Zbl 1452.62433
[6] Bouveyron, C.; Kannala, J.; Schmid, C.; Girard, S., Object localization by subspace clustering of local descriptors, (), 457-467
[7] Brodley, C.; Friedl, M., Identifying mislabeled training data, Journal of artificial intelligence research, 11, 131-167, (1999) · Zbl 0924.68158
[8] Celeux, G.; Govaert, G., Parsimonious Gaussian models in cluster analysis, Pattern recognition, 28, 781-793, (1995)
[9] d’Alche Buc, F.; Dagan, I.; Quinonero, J., The 2005 Pascal visual object classes challenge, ()
[10] Dasarathy, B., Noising around the neighbourhood: a new system structure and classification rule for recognition in partially exposed environments, IEEE transactions on pattern analysis and machine intelligence, 2, 67-71, (1980)
[11] Gamberger, D.; Lavrac, N.; Groselj, C., Experiments with noise filtering in a medical domain, (), 143-151
[12] Gates, G., The reduced nearest neighbor rule, IEEE transactions on information theory, 18, 3, 431-433, (1972)
[13] I. Guyon, N. Matic, V. Vapnik, Discovering informative patterns and data cleaning, Advances in Knowledge Discovery and Data Mining (1996) 181-203.
[14] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, Journal of the royal statistical society B, 58, 155-176, (1996) · Zbl 0850.62476
[15] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning, (2001), Springer New York
[16] Hawkins, D.; McLachlan, G., High-breakdown linear discriminant analysis, Journal of the American statistical association, 92, 437, 136-143, (1997) · Zbl 0889.62052
[17] John, G., Robust decision trees: removing outliers from databases, (), 174-179
[18] Lawrence, N.; Schölkopf, B., Estimating a kernel Fisher discriminant in the presence of label noise, (), 306-313
[19] Li, Y.; Wessels, L.; de Ridder, D.; Reinders, M., Classification in the presence of class noise using a probabilistic kernel Fisher method, Pattern recognition, 40, 12, 3349-3357, (2007) · Zbl 1123.68363
[20] Lowe, D., Distinctive image features from scale-invariant keypoints, International journal of computer vision, 60, 2, 91-110, (2004)
[21] McLachlan, G., Discriminant analysis and statistical pattern recognition, (1992), Wiley New York
[22] Mikolajczyk, K.; Schmid, C., Scale and affine invariant interest point detectors, International journal of computer vision, 60, 1, 63-86, (2004)
[23] Mingers, J., An empirical comparison of pruning methods for decision tree induction, Journal of machine learning, 4, 2, 227-243, (1989)
[24] Quinlan, J., Bagging, boosting and C4.5, (), 725-730
[25] Rousseeuw, P.J.; Leroy, A., Robust regression and outlier detection, (1987), Wiley New York · Zbl 0711.62030
[26] Sakakibara, Y., Noise-tolerant Occam algorithms and their applications to learning decision trees, Journal of machine learning, 11, 1, 37-62, (1993) · Zbl 0770.68100
[27] Schapire, R., The strength of weak learnability, Machine learning, 5, 197-227, (1990)
[28] Schwarz, G., Estimating the dimension of a model, The annals of statistics, 6, 461-464, (1978) · Zbl 0379.62005
[29] Vannoorenbergue, P.; Denoeux, T., Handling uncertain labels in multiclass problems using belief decision trees, ()
[30] Wilson, D.; Martinez, T., Instance pruning techniques, (), 404-411
[31] Zeng, X.; Martinez, T., A noise filtering method using neural networks, (), 26-31
[32] Zhu, X.; Wu, X.; Chen, Q., Eliminating class noise in large datasets, (), 920-927
[33] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society, 39, 1, 1-38, (1977) · Zbl 0364.62022
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.