×

Fractionally-supervised classification. (English) Zbl 1331.62319

Summary: Traditionally, there are three species of classification: unsupervised, supervised, and semi-supervised. Supervised and semi-supervised classification differ by whether or not weight is given to unlabelled observations in the classification procedure. In unsupervised classification, or clustering, all observations are unlabeled and hence full weight is given to unlabelled observations. When some observations are unlabelled, it can be very difficult to a priori choose the optimal level of supervision, and the consequences of a sub-optimal choice can be non-trivial. A flexible fractionally-supervised approach to classification is introduced, where any level of supervision – ranging from unsupervised to supervised – can be attained. Our approach uses a weighted likelihood, wherein weights control the relative role that labelled and unlabelled data have in building a classifier. A comparison between our approach and the traditional species is presented using simulated and real data. Gaussian mixture models are used as a vehicle to illustrate our fractionally-supervised classification approach; however, it is broadly applicable and variations on the postulated model can be easily made.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

gclus; MASS (R); R; AS 136
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] AKAIKE, H. (1973), “Information Theory and an Extension of the Maximum”, in Second International Symposium on Information Theory, pp. 267-281. · Zbl 0283.62006
[2] ANDERSON, E. (1935), “The Irises of the Gaspé Peninsula”, Bulletin of the American Iris Society, 59, 2-5. · Zbl 1247.62151
[3] ANDREWS, J.L., MCNICHOLAS, P.D., and SUBEDI, S. (2011), “Model-Based Classification ViaMixtures of Multivariate t-Distributions”, Computational Statistics and Data Analysis, 55(1), 520-529. · Zbl 1247.62151
[4] BALUJA, S. (1998), “Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data”, in Neural Information Processing Systems (NIPS ‘98), pp. 854-860. · Zbl 0801.62059
[5] BARANCHIK, A.J. (1970), “ A Family of Minimax Estimators of the Mean of a Multivariate Normal Distribution”, The Annals of Mathematical Statistics, 41, 642-645. · Zbl 0204.52504
[6] BAUDRY, J.-P., RAFTERY,A.E., CELEUX, G., LO, K., and GOTTARDO, R. (2010), “Combining Mixture Components for Clustering”, Journal of Computational and Graphical Statistics, 19(2), 332-353. · Zbl 0949.68162
[7] BERRY, J.C. (1994), “Improving the James-Stein Estimator Using the Stein Variance Estimator”, Statistics and Probability Letters, 20(3), 241-245. · Zbl 0801.62059
[8] BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719-725.
[9] CAMPBELL, N. A., and MAHON, R.J. (1974), “A Multivariate Study of Variation in Two Species of Rock Crab of Genus Leptograpsus”, Australian Journal of Zoology, 22, 417-425.
[10] CASTELLI, V., and COVER, T.M. (1996). “The Relative Value of Labeled and Unlabeled Samples in Pattern Recognition with an Unknown Mixing Parameter”, IEEE Transactions on Information Theory, 42(6), 2102-2117. · Zbl 0873.68185
[11] COZMAN, F. G., COHEN, I., and CIRELO, M.C. (2003), “Semi-Supervised Learning of Mixture Models”, in International Conference on Machine Learning (ICML 2003), pp. 99-106. · Zbl 1471.62202
[12] DEMPSTER, A.P., LAIRD, A.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society: Series B, 39(1), 1-38. · Zbl 0364.62022
[13] EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21, 362-375.
[14] FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7(Part II), 179-188.
[15] FORINA, M., ARMANINO, C., CASTINO, M., and UBIGLI,M. (1986), “Multivariate Data Analysis as a Discriminating Method of the Origin of Wines”, Vitis, 25, 189-201. · Zbl 0364.62022
[16] HARTIGAN, J.A., and WONG, M.A. (1979), “Algorithm AS 136: A k-Means Clustering Algorithm”, Journal of the Royal Statistical Society: Series C, 28(1), 100-108. · Zbl 0447.62062
[17] HENNIG, C. (2010), “Methods for Merging Gaussian Mixture Components”, Advances in Data Analysis and Classification, 4, 3-34. · Zbl 1306.62141
[18] HU, F. (1994), “RelevanceWeighted Smoothing and a New BootstrapMethod”, Ph. D. thesis, University of British Columbia.
[19] HU, F. (1997), “ The Asymptotic Properties of the Maximum-RelevanceWeighted Likelihood Estimators”, The Canadian Journal of Statistics, 25, 45-59. · Zbl 0904.62031
[20] HU, F., and ZIDEK, J.V. (2001), “The RelevanceWeighted Likelihood with Applications”, in Empirical Bayes and Likelihood Inference, Vol 148, Lecture notes in Statistics, eds. S.E. Ahmed and N. Reid, Springer, pp. 211-235. · Zbl 0042.38403
[21] HU, F., and ZIDEK, J.V. (2002), “The Weighted Likelihood”, The Canadian Journal of Statistics, 30(3), 347-371. · Zbl 1016.62002
[22] HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193-218.
[23] HURLEY, C. (2012), gclus: Clustering Graphics, R package version 1.3.1.
[24] JAMES,W., and STEIN, C. (1961), “Estimation with Quadratic Loss”, in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (Vol, 1), pp. 361-379. · Zbl 1281.62026
[25] KAWAKITA, M., and TAKEUCHI, J. (2014), “Safe Semi-Supervised Learning Based on Weighted Likelihood”, Neural Networks, 53, 146-164. · Zbl 1314.68260
[26] KULLBACK, S., and LEIBLER, R.A. (1951), “On Information and Sufficiency”, The Annals of Mathematical Statistics, 22(1), 79-86. · Zbl 0042.38403
[27] LEE, S., and MCLACHLAN, G. (2013), “OnMixtures of Skew Normal and Skew t-Distributions”, Advances in Data Analysis and Classification, 7(3), 241-266. · Zbl 1273.62115
[28] MCCALLUM, A., and NIGAM, K. (1998), “Employing EMand Pool-Based Active Learning for Text Classification”, in Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Madison, pp. 350-358.
[29] MCNICHOLAS, P.D. (2010), “Model-Based Classification Using Latent Gaussian Mixture Models”, Journal of Statistical Planning and Inference, 140(5), 1175-1181. · Zbl 1181.62095
[30] NIGAM, K.,MCCALLUM, A.K., THRUN, S., and MITCHELL, T. (2000), “Text Classification from Labeled and Unlabeled Documents Using EM”, Machine Learning, 39(2-3), 103-134. · Zbl 0949.68162
[31] PLANTÉ, J.F. (2008), “Adaptive Likelihood Weights and Mixtures of Empirical Distributions”, Ph. D. thesis, The University of British Columbia.
[32] PLANTÉ, J.F. (2009), “Asymptotic Properties of theMAMSE Adaptive LikelihoodWeights”, Journal of Statistical Planning and Inference, 139(7), 2147-2161. · Zbl 1160.62043
[33] R CORE TEAM (2015), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
[34] RATSABY, J., and VENKATESH, S.S. (1995), “Learning from a Mixture of Labeled and Unlabeled Examples with Parametric Side Information”, in ACM, Proceedings of the Eighth Annual Conference on Computational Learning Theory, pp. 412-417.
[35] SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6(2), 461-464. · Zbl 0379.62005
[36] SCOTT, A.J., and SYMONS, M.J. (1971), “Clustering Methods Based on Likelihood Ratio Criteria”, Biometrics, 27, 387-397.
[37] SOKOLOVSKA, N., CAPPÉ, O., and YVON, F. (2008), “The Asymptotics of Semi-Supervised Learning in Discriminative Probabilistic Models”, in ACM. Proceedings of the 25th International Conference on Machine Learning, pp. 984-991.
[38] STEIN, C. (1956), “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution”, in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1), pp. 197-206. · Zbl 0073.35602
[39] STEINLEY, D. (2004), “Properties of the Hubert-Arabie Adjusted Rand Index”, Psychological Methods, 9(3), 386. · Zbl 1244.65012
[40] STRAWDERMAN, W.E. (1973), “Proper Bayes Minimax Estimators of the Multivariate Normal Mean Vector for the Case of Common Unknown Variances”, The Annals of Statistics, 6, 1189-1194. · Zbl 0286.62007
[41] VANDEWALLE, V., BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2008), “Are Unlabeled Data Useful in Semi-Supervised Model-Based Classification? Combining Hypothesis Testing and Model Choice”, in Proceedings of the First Joint Meeting of the Société Francophone de Classification and the Classification and Data Analysis Group of SIS, pp. 433-436. · Zbl 1068.62025
[42] VENABLES, W.N., and RIPLEY, B.D. (2002), Modern Applied Statistics with S (4th ed.), New York: Springer. · Zbl 1006.62003
[43] VRBIK, I., and MCNICHOLAS, P.D. (2012), “Analytic Calculations for the EM Algorithm for Multivariate Skew-t Mixture Models”, Statistics and Probability Letters, 82(6), 1169-1174. · Zbl 1244.65012
[44] VRBIK, I., and MCNICHOLAS, P.D. (2014), “Parsimonious Skew Mixture Models for Model-Based Clustering and Classification”, Computational Statistics and Data Analysis, 71, 196-210. · Zbl 1471.62202
[45] WALD, A. (1949), “Note on the Consistency of the Maximum Likelihood Estimate”, The Annals of Mathematical Statistics, 20(4), 595-601. · Zbl 0034.22902
[46] WANG, S.X. (2001), “Maximum Weighted Likelihood Estimation”, Ph. D. thesis, University of British Columbia.
[47] WANG, S.X., VAN EEDEN, C., and ZIDEK, J.V. (2004), “Asymptotic Properties of Maximum Weighted Likelihood Estimators”, Journal of Statistical Planning and Inference, 119, 37-54. · Zbl 1041.62020
[48] WANG, S.X., and ZIDEK, J.V. (2005), “Selecting Likelihood Weights by Cross-Validation”, Annals of Statistics, 33, 463-501. · Zbl 1068.62025
[49] WANG, X. (2006), “Approximating Bayesian Inference by Weighted Likelihood”, Canadian Journal of Statistics, 34(2), 279-298. · Zbl 1142.62307
[50] WOLFE, J.H. (1965), “A Computer Program for the Maximum Likelihood Analysis of Types”, Technical Bulletin 65-15, U.S. Naval Personnel Research Activity.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.