On fractionally-supervised classification: weight selection and extension to the multivariate \(t\)-distribution. (English) Zbl 1436.62252

Summary: Recent work on fractionally-supervised classification (FSC), an approach that allows classification to be carried out with a fractional amount of weight given to the unlabelled points, is further developed in two respects. The primary development addresses a question of fundamental importance over how to choose the amount of weight given to the unlabelled points. The resolution of this matter is essential because it makes FSC more readily applicable to real problems. Interestingly, the resolution of the weight selection problem opens up the possibility of a different approach to model selection in model-based clustering and classification. A secondary development demonstrates that the FSC approach can be effective beyond Gaussian mixture models. To this end, an FSC approach is illustrated using mixtures of multivariate \(t\)-distributions.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F03 Parametric hypothesis testing
62H10 Multivariate distribution of statistics
Full Text: DOI arXiv


[1] AITKEN, A.C. (1926), “A Series Formula for the Roots of Algebraic and Transcendental Equations”, Proceedings of the Royal Society of Edinburgh, 45, 14-22. · JFM 51.0096.03
[2] ANDREWS, J.L., and MCNICHOLAS, P.D. (2011a), “Extending Mixtures of Multivariate t-Factor Analyzers”, Statistics and Computing, 21(3), 361-373. · Zbl 1255.62175
[3] ANDREWS, J.L., and MCNICHOLAS, P.D. (2011b), “Mixtures of Modified t-Factor Analyzers for Model-Based Clustering, Classification, and Discriminant Analysis”, Journal of Statistical Planning and Inference, 141(4), 1479-1486. · Zbl 1204.62098
[4] ANDREWS, J.L., and MCNICHOLAS, P.D. (2012), “Model-Based Clustering, Classification, and Discriminant Analysis viaMixtures ofMultivariate <Emphasis Type=”Italic“>t-Distributions: The <Emphasis Type=”Italic“>tEIGEN Family”, Statistics and Computing, 22(5), 1021-1029. · Zbl 1252.62062
[5] ANDREWS, J.L., and MCNICHOLAS, P.D. (2014), “Variable Selection for Clustering and Classification”, Journal of Classification, 31(2), 136-153. · Zbl 1360.62310
[6] ANDREWS, J.L., WICKINS, J.R., BOERS, N.M., and MCNICHOLAS, P.D. (2016), teigen: Model-Based Clustering and Classification with the Multivariate t Distribution, R Package Version 2.2.0.
[7] BANFIELD, J.D., and RAFTERY, A.E. (1993), “Model-Based Gaussian and Non-Gaussian Clustering”, Biometrics, 49(3), 803-821. · Zbl 0794.62034
[8] BAUM, L.E., PETRIE, T., SOULES, G., and WEISS, N. (1970), “AMaximization Technique Occurring in the Statistical Analysis of Probabilistic Functions ofMarkov Chains”, Annals of Mathematical Statistics, 41, 164-171. · Zbl 0188.49603
[9] BENSMAIL, H., CELEUX, G., RAFTERY, A., and ROBERT, C. (1997), “Inference in Model-Based Cluster Analysis”, Statistics and Computing, 7, 1-10.
[10] BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719-725.
[11] CELEUX, G., and GOVAERT, G. (1995), “Gaussian Parsimonious Clustering Models”, Pattern Recognition, 28(5), 781-793.
[12] CELEUX, G., and SOROMENHO, G. (1996), “An Entropy Criterion for Assessing the Number of Clusters in a Mixture Model”, Journal of Classification, 13, 195-212. · Zbl 0861.62051
[13] DANG, U.J., BROWNE, R.P., and MCNICHOLAS, P.D. (2015), “Mixtures of Multivariate Power Exponential Distributions”, Biometrics, 71(4), 1081-1089. · Zbl 1419.62330
[14] DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society: Series B, 39(1), 1-38. · Zbl 0364.62022
[15] EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21, 362-375.
[16] FRALEY, C., and RAFTERY, A.E. (1998), “How Many Clusters? Which Clustering Methods? Answers via Model-Based Cluster Analysis”, The Computer Journal, 41(8), 578-588. · Zbl 0920.68038
[17] FRANCZAK, B.C., BROWNE, R.P., and MCNICHOLAS, P.D. (2014), “Mixtures of Shifted Asymmetric Laplace Distributions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149-1157.
[18] FRANCZAK, B.C., TORTORA, C., BROWNE, R.P., and MCNICHOLAS, P.D. (2015), “Unsupervised Learning via Mixtures of Skewed Distributions with Hypercube Contours”, Pattern Recognition Letters, 58(1), 69-76.
[19] FRIEDMAN, H.P., and RUBIN, J. (1967), “On Some Invariant Criteria for Grouping Data”, Journal of the American Statistical Association, 62, 1159-1178.
[20] GORDON, A.D. (1981), Classification, London: Chapman and Hall.
[21] HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2(1), 193-218. · Zbl 0587.62128
[22] HURLEY, C. (2004), “Clustering Visualizations of Multivariate Data”, Journal of Computational and Graphical Statistics, 13(4), 788-806.
[23] INGRASSIA, S., MINOTTI, S.C., PUNZO, A., and VITTADINI, G. (2015), “The Generalized Linear Mixed Cluster-Weighted Model”, Journal of Classification, 32(1), 85-113. · Zbl 1331.62310
[24] INGRASSIA, S., MINOTTI, S.C., PUNZO, A., and VITTADINI, G. (2012), “Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions”, Journal of Classification, 29(3), 363-401. · Zbl 1360.62335
[25] LEE, S., and MCLACHLAN, G.J. (2014), “Finite Mixtures of Multivariate Skew T-Distributions: Some Recent and New Results”, Statistics and Computing, 24, 181-202. · Zbl 1325.62107
[26] LEE, S.X., and MCLACHLAN, G.J. (2013), “On Mixtures of Skew Normal and Skew t- Distributions”, Advances in Data Analysis and Classification. 7(3), 241-266. · Zbl 1273.62115
[27] LIN, T.-I. (2010), “Robust Mixture Modeling Using Multivariate Skew t Distributions”, Statistics and Computing, 20(3), 343-356.
[28] LIN, T.-I.,MCLACHLAN, G.J., and LEE, S.X. (2016), “Extending Mixtures of FactorModels Using the RestrictedMultivariate Skew-Normal Distribution”, Journal of Multivariate Analysis, 143, 398-413. · Zbl 1328.62378
[29] LIN, T.-I., MCNICHOLAS, P.D., and HSIU, J.H. (2014), “Capturing Patterns via Parsimonious t Mixture Models”, Statistics and Probability Letters, 88, 80-87. · Zbl 1369.62131
[30] MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposium onMathematical Statistics and Probability, Volume 1: Statistics, Berkeley, University of California Press, pp. 281-297. · Zbl 0214.46201
[31] MCNICHOLAS, P.D. (2016a), Mixture Model-Based Classification, Boca Raton: Chapman and Hall/CRC Press. · Zbl 1454.62005
[32] MCNICHOLAS, P.D. (2016b), “Model-Based Clustering”, Journal of Classification, 33(3), 331-373. · Zbl 1364.62155
[33] MCNICHOLAS, P.D., and MURPHY, T.B. (2008), “Parsimonious Gaussian Mixture Models”, Statistics and Computing, 18(3), 285-296.
[34] MCNICHOLAS, P.D., MURPHY, T.B., MCDAID, A.F., and FROST, D. (2010), “Serial and Parallel Implementations of Model-Based Clustering via Parsimonious Gaussian Mixture Models”, Computational Statistics and Data Analysis, 54(3), 711-723. · Zbl 1464.62131
[35] MURRAY, P.M., BROWNE, R.P., and MCNICHOLAS, P.D. (2014a), “Mixtures of Skew-t Factor Analyzers”, Computational Statistics and Data Analysis, 77, 326-335. · Zbl 06984029
[36] MURRAY, P.M., BROWNE, R.P., and MCNICHOLAS, P.D. (2017a), “Hidden Truncation Hyperbolic Distributions, Finite Mixtures Thereof, and Their Application for Clustering”, Journal of Multivariate Analysis, 161, 141-156. · Zbl 1403.62028
[37] MURRAY, P.M., BROWNE, R.P., and MCNICHOLAS, P.D. (2017b), “A Mixture of SDB Skew-t Factor Analyzers”, Econometrics and Statistics, 3, 160-168.
[38] MURRAY, P.M., MCNICHOLAS, P.D., and BROWNE, R.P. (2014b), “A Mixture of Common Skew-<Emphasis Type=”Italic“>t Factor Analyzers”, Stat, 3(1), 68-82. · Zbl 06984029
[39] ORCHARD, T.; WOODBURY, MA; Cam, LM (ed.); Neyman, J. (ed.); Scott, EL (ed.), A Missing Information Principle: Theory and Applications, 697-715 (1972), Berkeley
[40] PEEL, D., and MCLACHLAN, G.J. (2000), “Robust Mixture Modelling Using the t Distribution”, Statistics and Computing, 10(4), 339-348.
[41] PUNZO, A., and MCNICHOLAS, P.D. (2017), Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model“, <Emphasis Type=”Italic”>Journal of Classification, 34(2), 249-293. · Zbl 1373.62316
[42] R CORE TEAM (2016), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
[43] SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6(2), 461-464. · Zbl 0379.62005
[44] SCOTT, A. J., and SYMONS, M. J. (1971), “Clustering Methods Based on Likelihood Ratio Criteria”, Biometrics, 27, 387-397.
[45] STEANE, M.A., MCNICHOLAS, P.D., and YADA, R. (2012), “Model-Based Classification via Mixtures of Multivariate t-Factor Analyzers”, Communications in Statistics - Simulation and Computation, 41(4), 510-523. · Zbl 1294.62142
[46] SUBEDI, S., PUNZO, A., INGRASSIA, S., and MCNICHOLAS, P.D. (2013), “Clustering and Classification via Cluster-Weighted Factor Analyzers”, Advances in Data Analysis and Classification, 7(1), 5-40. · Zbl 1271.62137
[47] SUBEDI, S., PUNZO, A., INGRASSIA, S., and MCNICHOLAS, P.D. (2015), “Cluster-Weighted t-Factor Analyzers for Robust Model-Based Clustering and Dimension Reduction”, Statistical Methods and Applications, 24(4), 623-649. · Zbl 1416.62362
[48] TIEDEMAN, D.V. (1955), “On the Study of Types”, in Symposium on Pattern Analysis, ed. S.B. Sells, Randolph Field, Texas: Air University, U.S.A.F. School of Aviation Medicine.
[49] TORTORA, C., BROWNE, R.P., FRANCZAK, B.C., and MCNICHOLAS, P.D. (2015), MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions, R Package Version 1.8.
[50] VENABLES, W.N., and RIPLEY, B.D. (2002), Modern Applied Statistics with S (4th ed.), New York: Springer. · Zbl 1006.62003
[51] VRBIK, I., and MCNICHOLAS, P.D. (2012), “Analytic Calculations for the EM Algorithm for Multivariate Skew-t Mixture Models”, Statistics and Probability Letters, 82(6), 1169-1174. · Zbl 1244.65012
[52] VRBIK, I., and MCNICHOLAS, P.D. (2014), “Parsimonious Skew Mixture Models for Model-Based Clustering and Classification”, Computational Statistics and Data Analysis, 71, 196-210. · Zbl 1471.62202
[53] VRBIK, I., and MCNICHOLAS, P.D. (2015), “Fractionally-Supervised Classification”, Journal of Classification, 32(3), 359-381. · Zbl 1331.62319
[54] WOLFE, J. H. (1965), “A Computer Program for the Maximum Likelihood Analysis of Types”, Technical Bulletin 65-15, U.S. Naval Personnel Research Activity.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.