×

Toward an efficient computation of log-likelihood functions in statistical inference: overdispersed count data clustering. (English) Zbl 1434.62116

Bouguila, Nizar (ed.) et al., Mixture models and applications. Cham: Springer. Unsuperv. Semi-Superv. Learn., 155-176 (2020).
Summary: This work presents an unsupervised learning algorithm, using the mesh method for computing the log-likelihood function. The multinomial Dirichlet distribution (MDD) is one of the widely used methods of modeling multicategorical count data with overdispersion. Recently, it has been shown that traditional numerical computation of the MDD log-likelihood function either results in instability or leads to long run times that make its use infeasible in case of large datasets. Thus, we propose to use the mesh algorithm that involves approximating the MDD log-likelihood function based on Bernoulli polynomials. Moreover, we extend the mesh algorithm approach for computing the log-likelihood function of a more flexible distribution, namely the multinomial generalized Dirichlet (MGD). We demonstrate the efficiency of this method in statistical inference, i.e., maximum likelihood estimation, for fitting finite mixture models based on MDD and MGD as efficient distributions for count data. Through a set of experiments, the proposed approach shows its merits in two real-world clustering problems, namely natural scenes categorization and facial expression recognition.
For the entire collection see [Zbl 1430.62012].

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
62G05 Nonparametric estimation
62H35 Image analysis in multivariate analysis
62P30 Applications of statistics in engineering and industry; control charts
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, New York (2011) · Zbl 1462.62367
[2] Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)
[3] Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803-821 (1993) · Zbl 0794.62034
[4] Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462-474 (2008)
[5] Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533-1543 (2004)
[6] Busam, R., Freitag, E.: Complex Analysis. Springer, London (2009) · Zbl 1167.30001
[7] Cadez, I.V., Smyth, P., McLachlan, G.J., McLaren, C.E.: Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Mach. Learn. 47(1), 7-34 (2002) · Zbl 1012.68057
[8] Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data, vol. 53. Cambridge University Press, Cambridge (2013) · Zbl 1301.62003
[9] Casella, G., Berger, R.: Duxbury advanced series in statistics and decision sciences. Statistical Inference (2002)
[10] Church, K.W., Gale, W.A.: Poisson mixtures. Nat. Lang. Eng. 1(2), 163-190 (1995)
[11] Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194-206 (1969) · Zbl 0179.24101
[12] Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague vol. 1, pp. 1-2 (2004)
[13] De Dinechin, F., Lauter, C.Q.: Optimizing polynomials for floating-point implementation (2008). Preprint. arXiv:0803.0439
[14] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1-22 (1977) · Zbl 0364.62022
[15] Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 524-531. IEEE, New York (2005)
[16] Griffiths, D.: Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29(4), 637-648 (1973)
[17] Haseman, J., Kupper, L.: Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35(1), 281-293 (1979)
[18] Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, Cambridge (2011) · Zbl 1269.62063
[19] Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15-59 (1996)
[20] Leckenby, J.D., Kishi, S.: The Dirichlet multinomial distribution as a magazine exposure model. J. Market. Res. 21(1), 100-106 (1984)
[21] Lewy, P.: A generalized Dirichlet distribution accounting for singularities of the variables. Biometrics 52(4), 1394-1409 (1996) · Zbl 0867.62043
[22] Lochner, R.H.: A generalized Dirichlet distribution in Bayesian life testing. J. R. Stat. Soc. Ser. B (Methodol.) 37(1), 103-113 (1975) · Zbl 0297.62076
[23] Loh, W.Y.: Symmetric multivariate and related distributions. Technometrics 34(2), 235-236 (1992)
[24] Lowe, S.A.: The beta-binomial mixture model and its application to TDT tracking and detection. In: Proceedings of DARPA Broadcast News Workshop, pp. 127-131 (1999)
[25] Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91-110 (2004)
[26] Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94-101. IEEE, New York (2010)
[27] MacKay, D.J., Peto, L.C.B.: A hierarchical Dirichlet language model. Nat. Lang. Eng. 1(3), 289-308 (1995)
[28] Madsen, R.E., Kauchak, D., Elkan, C.: Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 545-552. ACM, New York (2005)
[29] McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007) · Zbl 1165.62019
[30] McLachlan, G., Peel., D.: Finite Mixture Models. Wiley, Hoboken (2000) · Zbl 0963.62061
[31] McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Annu. Rev. Stat. Appl. 6, 355-378 (2000)
[32] Mimno, D., McCallum, A.: Topic models conditioned on arbitrary features with Dirichlet-multinomial regression (2012). Preprint. arXiv:1206.3278
[33] Minka, T.: Estimating a Dirichlet distribution (2000). http://research.microsoft.com/ minka/papers/dirichlet
[34] Mosimann, J.E.: On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49(1/2), 65-82 (1962) · Zbl 0105.12502
[35] Neerchal, N.K., Morel, J.G.: An improved method for the computation of maximum likelihood estimates for multinomial overdispersion models. Comput. Stat. Data Anal. 49(1), 33-43 (2005) · Zbl 1430.62015
[36] Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2-3), 103-134 (2000) · Zbl 0949.68162
[37] Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145-175 (2001) · Zbl 0990.68601
[38] Poortema, K.: On modelling overdispersion of counts. Stat. Neerl. 53(1), 5-20 (1999) · Zbl 0946.62035
[39] Puig, P., Valero, J.: Count data distributions: some characterizations with applications. J. Am. Stat. Assoc. 101(473), 332-340 (2006) · Zbl 1118.62307
[40] Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195-239 (1984) · Zbl 0536.62021
[41] Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 616-623 (2003)
[42] Rowe, C.H.: A proof of the asymptotic series for log γ (z) and log γ (z+ a). Ann. Math. 32(1), 10-16 (1931) · JFM 57.0422.01
[43] Rust, R.T., Leone, R.P.: The mixed-media Dirichlet multinomial distribution: a model for evaluating television-magazine advertising schedules. J. Mark. Res. 21(1), 89-99 (1984)
[44] Teevan, J., Karger, D.R.: Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18-25. ACM, New York (2003)
[45] Tirri, H., Kontkanen, P., Myllym Aki, P.: Probabilistic instance-based learning. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 507-515 (1996)
[46] Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, pp. 737-744 (2003)
[47] Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Intern. Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, Paris, p. 65 (2010)
[48] Whittaker, E., Watson, G.: A Course of Modern Analysis. Cambridge University Press, Cambridge (1990) · JFM 45.0433.02
[49] Wong, T.T.: Generalized Dirichlet distribution in Bayesian analysis. Appl. Math. Comput. 97(2-3), 165-181 (1998) · Zbl 0945.62036
[50] Wong, T.T.: Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min. Knowl. Disc. 18(2), 183-213 (2009)
[51] Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485-3492. IEEE, New York (2010)
[52] Yu, P., Shaw, C.A.: An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function. Bioinformatics 30(11), 1547-1554 (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.