×

A predictive deviance criterion for selecting a generative model in semi-supervised classification. (English) Zbl 1468.62202

Summary: Semi-supervised classification can help to improve generative classifiers by taking into account the information provided by the unlabeled data points, especially when there are far more unlabeled data than labeled data. The aim is to select a generative classification model using both unlabeled and labeled data. A predictive deviance criterion, \(\mathrm{AIC}_{\mathrm{cond}}\), aiming to select a parsimonious and relevant generative classifier in the semi-supervised context is proposed. In contrast to standard information criteria such as AIC and BIC, \(\mathrm{AIC}_{\mathrm{cond}}\) is focused on the classification task, since it attempts to measure the predictive power of a generative model by approximating its predictive deviance. However, it avoids the computational cost of cross-validation criteria, which make repeated use of the EM algorithm. \(\mathrm{AIC}_{\mathrm{cond}}\) is proved to have consistency properties that ensure its parsimony when compared with the Bayesian Entropy Criterion (BEC), whose focus is similar to that of \(\mathrm{AIC}_{\mathrm{cond}}\). Numerical experiments on both simulated and real data sets show that the behavior of \(\mathrm{AIC}_{\mathrm{cond}}\) as regards the selection of variables and models, is encouraging when it is compared to the competing criteria.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Akaike, H., 1974. Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp. 267-281.
[2] Amemiya, T., Regression analysis when the dependent variables is truncated normal, Econometrica, 41, 997-1016, (1973) · Zbl 0282.62061
[3] Anderson, J. A.; Richardson, S. C., Logistic discrimination and bias correction in maximum likelihood estimation, Technometrics, 21, 1, 71-78, (1979) · Zbl 0399.62028
[4] Bensmail, H.; Celeux, G., Regularized discriminant analysis, Journal of the American Statistical Association, 91, 1743-1748, (1996) · Zbl 0885.62068
[5] Biernacki, C.; Beninel, F.; Bretagnolle, V., A generalized discriminant rule when training population and test population differ on their descriptive parameters, Biometrics, 58, 2, 387-397, (2002) · Zbl 1210.62077
[6] Bouchard, G.; Celeux, G., Selection of generative models in classification, IEEE Transaction on Pattern Analysis and Machine Intelligence, 28, 4, 544-554, (2006)
[7] Celeux, G.; Govaert, G., Parsimonious Gaussian models in cluster analysis, Pattern Recognition, 28, 781-793, (1995)
[8] (Chapelle, O.; Schölkopf, B.; Zien, A., Semi-Supervised Learning, (2006), MIT Press Cambridge, MA), URL http://www.kyb.tuebingen.mpg.de/ssl-book
[9] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B (Methodological), 39, 1, 1-38, (1977) · Zbl 0364.62022
[10] Devos, O.; Ruckebusch, C.; Durand, A.; Duponchel, L.; Huvenne, J., Support vector machines (SVM) in near infrared (NIR) spectroscopy: focus on parameters optimization and model interpretation, Chemometrics and Intelligent Laboratory Systems, 96, 1, 27-33, (2009)
[11] Fisher, R. A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 179-188, (1936)
[12] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631, (2002) · Zbl 1073.62545
[13] Frühwirth-Schnatter, S., Finite mixture and Markov switching models, (2006), Springer · Zbl 1108.62002
[14] Goldenshluger, A.; Greenshtein, E., Asymptotically minimax regret procedures in regression model selection and the magnitude of the dimension penalty, The Annals of Statistics, 28, 1620-1637, (2000) · Zbl 1105.62356
[15] Grandvalet, Y.; Bengio, Y., Semi-supervised learning by entropy minimization, (Saul, L. K.; Weiss, Y.; Bottou, L., Advances in Neural Information Processing Systems, Vol. 17, (2005), MIT Press Cambridge, MA), 529-536
[16] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, Journal of the Royal Statistical Society, Series B (Methodological), 58, 155-176, (1996) · Zbl 0850.62476
[17] Hastie, T.; Tibshirani, R.; Friedman, J., (The Elements of Statistical Learning, Springer Series in Statistics, (2009))
[18] Jacques, J.; Bouveyron, C.; Girard, S.; Devos, O.; Duponchel, L.; Ruckebusch, C., Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data, Journal of Chemometrics, 24, 11-12, 719-727, (2010)
[19] Jennrich, R. I., Asymptotic properties of non linear least square estimators, Annals of Mathematical Statistics, 40, 633-643, (1969) · Zbl 0193.47201
[20] Joachims, T., Transductive inference for text classification using support vector machines, (Bratko, I.; Dzeroski, S., Proceedings of ICML-99, 16th International Conference on Machine Learning, (1999), Morgan Kaufmann Publishers San Francisco, US, Bled, SL), 200-209, URL citeseer.ist.psu.edu/joachims99transductive.html
[21] Keribin, C., Consistent estimation of the order of mixture models, Sankhyā: The Indian Journal of Statistics, Series A, 62, 1, 49-66, (2000) · Zbl 1081.62516
[22] Mclachlan, G., (Discriminant Analysis and Statistical Pattern Recognition, Wiley Series in Probability and Statistics, (2004), Wiley-Interscience)
[23] McLachlan, G.; Peel, D., (Finite Mixutre Models, Wiley Series in Probability and Statistics: Applied Probability and Statistics, (2000), Wiley-Interscience New York)
[24] Miller, D. J.; Browning, J., A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 11, 1468-1483, (2003)
[25] O’Neill, T., The general distribution of the error rate of a classification procedure with application to logistic regression discrimination, Journal of the American Statistical Association, 75, 369, 154-160, (1980) · Zbl 0454.62052
[26] Rosenblat, F., The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review, 65, 386-408, (1958)
[27] Schwarz, G., Estimating the dimension of a model, Annals of Statistics, 6, 461-464, (1978) · Zbl 0379.62005
[28] Stone, M., An asymptotic equivalence of choice of model by cross-validation and akaike’s criterion, Journal of the Royal Statistical Society, Series B (Methodological), 39, 1, 44-47, (1977) · Zbl 0355.62002
[29] Toher, D., Downey, G., Murphy, T.B., 2005. Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data, Young Statisticians Meeting.
[30] van der Vaart, A. W., Asymptotic statistics, (2000), Cambridge University Press · Zbl 0910.62001
[31] Vandewalle, V., 2009, Estimation et sélection en classification semi-supervisée, Ph.D. Thesis, Université de Lille 1.
[32] Vapnik, V., The nature of statistical learning theory, (1995), Springer-Verlag · Zbl 0833.62008
[33] White, H., Consequences and detection of misspecified nonlinear regression models, Journal of the American Statistical Association, 76, 374, 419-433, (1981) · Zbl 0467.62058
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.