×

A review of reliable maximum likelihood algorithms for semiparametric mixture models. (English) Zbl 0832.62026

Summary: This contribution reviews some of the major developments in the algorithmic area connected with the construction of maximum likelihood estimators in semiparametric mixture models. Mixture models arise in a natural way in that they are modelling unobserved population heterogeneity. It is assumed that the population consists of a possibly unknown number \(k\) of subpopulations with parameters \(\vartheta_1, \ldots, \vartheta_k\) receiving weights \(\alpha_1, \ldots, \alpha_k\). Since it is not possible to observe the subpopulation membership, one can only observe data coming from the marginal or mixture density \(f(x,P) = \sum^k_{j = 1} f(x, \vartheta_j) \alpha_j\) with \(P\) giving mass \(\alpha_j\) to \(\vartheta_j\). The log-likelihood \(\lambda (P)\) becomes \(\sum_i \log f(x_i,P)\) and it is easily seen that \(\lambda (P)\) forms a concave functional on the convex set of all probability distributions and this property in essence produces the basis for the construction principles of reliable maximum likelihood algorithms.
In Section 1, the gradient function is discussed. Any optimization algorithm needs a search direction and the gradient function serves as a basis for finding vertex directions. Section 2 discusses vertex direction methods. The fundamental idea of vertex direction methods is as follows. Consider a vertex \(P_\vartheta\) (the probability measure putting all mass at \(\vartheta)\) and set up the convex combination of some current \(P\) and \(P_\vartheta : (1 - \alpha) P + \alpha P_\vartheta\), where \(\alpha\) is called the step-length. Good choices will be discussed in the next section. The gradient function is used to find the appropriate vertex direction. If \(\vartheta\) is chosen to maximize the gradient function the associated convex combination is called the vertex direction method (VDM). Large improvements of this method such as the vertex exchange method (VEM) or the intra simplex direction method (ISDM) will be discussed. In Section 3, the problem of finding a monotonic step- length given some current value \(P\) and a search direction \(H\) is studied. The idea of estimating the area above the second derivative curve is introduced and related to existing algorithms. The concept of area overestimation then leads to the monotonicity of the associated algorithm. Section 4 discusses the problem of maximizing a concave function of a finite number of probabilities. Three algorithmic approaches are discussed: the projection approach, the transformation approach and the generalized EM approach. Concluding in Section 5, the case of a fixed (known) number of components in the mixture model as well as software packages available for all mentioned algorithms are discussed.

MSC:

62G05 Nonparametric estimation
62F10 Point estimation
65C99 Probabilistic methods, stochastic differential equations

Software:

AS 221
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agha, M.; Ibrahim, M. T., Maximum likelihood estimation of mixtures of distributions, J. Roy. Statist. Soc. Ser. C, 33, 327-332 (1984)
[2] Aitkin, M.; Wilson, G. Tunnaclife, Mixture models, outliers, and the EM algorithm, Technometrics, 22, 325-332 (1984)
[3] Atwood, C. L., Sequences converging to D-optimal designs of experiments, Ann. Statist., 1, 342-352 (1973) · Zbl 0263.62047
[4] Atwood, C. L., Computational considerations for convergence to an optimal design, (Proc. 1976 Conf. on Information Sciences and Systems (1976), Dept. of Electrical Engineering, Johns Hopkins Univ) · Zbl 0182.51905
[5] Atwood, C. L., Convergent design sequences for sufficiently regular optimality criteria, Ann. Statist., 4, 1124-1138 (1976) · Zbl 0344.62064
[6] Atwood, C. L., Convergent design sequences for sufficiently regular optimality criteria, II: singular case, Ann. Statist., 8, 894-913 (1980) · Zbl 0446.62078
[7] Böhning, D., Convergence of Simar’s algorithm for finding the MLE of a compound Poisson process, Ann. Statist., 10, 1006-1008 (1982) · Zbl 0523.62074
[8] Böhning, D., A duality theorem with applications to statistics, Math. Operationsforsch. Statist. Ser. Statist., 14, 551-557 (1983) · Zbl 0534.62052
[9] Böhning, D., Use of reparameterization in nonlinear optimization with applications to statistics and optimal design, Comput. Statist. Quart., 1, 29-43 (1984) · Zbl 0615.62040
[10] Böhning, D., Numerical estimation of a probability measure, J. Statist. Plann. Inference, 11, 57-69 (1985) · Zbl 0574.62046
[11] Böhning, D., A vertex-exchange method in D-optimal design theory, Metrika, 33, 337-347 (1986) · Zbl 0601.62091
[12] Böhning, D., Likelihood inference for mixtures: geometrical and other constructions of monotone step-length algorithms, Biometrika, 76, 375-383 (1989) · Zbl 0666.62044
[13] Böhning, D.; Dietz, E.; Schaub, R.; Schlattmann, P.; Lindsay, B. G., The distribution of the likelihood ratio of mixtures of densities from the one-parameter exponential family, Ann. Inst. Statist. Math., 46, 373-388 (1994) · Zbl 0802.62017
[14] Böhning, D.; Hoffmann, K.-H., Numerical procedures for estimating probabilities, J. Statist. Sim. Comp., 14, 283-293 (1982) · Zbl 0478.65087
[15] Böhning, D.; Hoffmann, K.-H., A remark on to the numerical estimation of probabilities, Statistics, 17, 231-236 (1986) · Zbl 0589.65095
[16] Böhning, D.; Lindsay, B. G., Monotonicity of quadratic approximation algorithms, Ann. Inst. Statist. Math., 40, 641-663 (1988) · Zbl 0723.65150
[17] Böhning, D.; Schlattmann, P.; Lindsay, B., Computer-assisted analysis of mixtures (C:A:MAN): statistical algorithms, Biometrics, 48, 283-303 (1992)
[18] Coope, I. D.; Watson, G. A., A projected Lagrangian algorithm for semi-infinite programming, Math. Programming, 32, 337-356 (1985) · Zbl 0572.90082
[19] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm (with discussion), J. Roy. Statist. Soc., 39, 1-38 (1977) · Zbl 0364.62022
[20] DerSimonian, R., Algorithm AS 221: Maximum likeihood estimation of a mixing distribution, J. Roy. Statist. Soc. Ser. C, 35, 302-309 (1986) · Zbl 0623.62038
[21] DerSimonian, R., Correction to Algorithm AS 221 maximum likelihood estimation of a mixing distribution, J. Roy. Statist. Soc. Ser. C, 39, 176 (1990)
[22] Dietz, E., Estimation of heterogeneity — a GLM approach, Lecture Notes in Statistics, Vol. 78, 66-71 (1992) · Zbl 0800.62437
[23] Everitt, B. S.; Hand, D. J., Finite Mixture Distributions (1981), Chapman and Hall: Chapman and Hall London · Zbl 0466.62018
[24] Fahrmeir, L.; Hamerle, A., Multivariate Statistiche Verfahren (1984), Walter de Gruyter: Walter de Gruyter New York
[25] Fedorov, V. V., Theory of Optimal Experiments (1972), Academic Press: Academic Press New York
[26] Fellman, J., Some aspects of the iterative search for optimal designs (1987), Swedish School of Economics and Business Administration: Swedish School of Economics and Business Administration Helsinki, Finland, Preprint
[27] Formann, A. K., Neuere Verfahren der Parameterschätzung in der Latent-Class-Analyse, Z. Differentielle Diagnostische Psychologie, 1 and 2, 107-116 (1980)
[28] Formann, A. K., Linear logistic latent class analysis, Biometrical J., 24, 171-190 (1982) · Zbl 0516.62106
[29] Gediga, G.; Holling, H., On the convergence of the EM algorithm including different methods of Aitkin acceleration for finite mixture models, (COMPSTAT 88 (1988), Short Communications and Posters: Short Communications and Posters Physica, Wien)
[30] Gribik, P. R.; Kortanek, K. O., Equivalence theorems and cutting plane algorithms for a class of experimental design problems, (Report No. 22 (1975), Carnegie Mellon Univ) · Zbl 0356.62060
[31] Gribik, P. R.; Kortanek, K. O., Equivalence theorems and cutting plane algorithms for a class of experimental design problems, SIAM J. Appl. Math., 32, 232-259 (1977) · Zbl 0356.62060
[32] Hasselblad, V., Estimation of parameters for a mixture of normal distributions, Technometrics, 8, 431-444 (1966)
[33] Hasselblad, V., Estimation of finite mixtures of distributions from the exponential family, J. Amer. Statist. Assoc., 64, 1459-1471 (1969)
[34] Hathaway, R. J., Constrained maximum-likelihood estimation for a mixture of \(m\) univariate normal distributions, (Statistics Tech. Report, 92, 62F10-2 (1983), Univ. of South Carolina: Univ. of South Carolina Columbia, SC)
[35] Hathaway, R. J., Another interpretation of the EM algorithm for mixture distributions, Statist. Probab. Lett., 4, 53-56 (1986) · Zbl 0585.62052
[36] Jamshidian, M.; Jnnrich, R. I., Conjugate gradient acceleration of the EM algorithm, J. Amer. Statist. Assoc., 88, 221-228 (1993) · Zbl 0775.65025
[37] Jewell, N. P., Mixtures of exponential distributions, Ann. Statist., 10, 479-484 (1982) · Zbl 0495.62042
[38] Kiefer, J.; Wolfowitz, J., The equivalence of two extremum problems, Canad. J. Math., 12, 363-366 (1960) · Zbl 0093.15602
[39] Laird, N. M., Nonparametric maximum likelihood estimation of a mixing distribution, J. Amer. Statist. Assoc., 73, 805-811 (1978) · Zbl 0391.62029
[40] Laird, N. M., Empirical Bayes estimators using the nonparametric maximum likelihood estimate for the prior, J. Statist. Comput. Simulation, 15, 211-220 (1982) · Zbl 0498.62039
[41] Lesperance, M.; Kalbfleisch, J. D., An algorithm for computing the nonparametric MLE of a mixing distribution, J. Amer. Statist. Assoc., 87, 120-126 (1992) · Zbl 0850.62336
[42] Lindsay, B. G., Properties of the maximum likelihood estimator of a mixing distribution, (Taillie, C.; etal., Statistical Distributions in Scientific Work (1981), D. Reidel Publ: D. Reidel Publ Dordrecht), 95-109
[43] Lindsay, B. G., The geometry of mixture likelihoods, part I: a general theory, Ann. Statist., 11, 783-792 (1983) · Zbl 0534.62002
[44] Lindsay, B. G., The geometry of mixture likelihoods, part II: the exponential family, Ann Statist., 11, 783-792 (1983) · Zbl 0534.62002
[45] Lindsay, B. G., Optimal EM algorithms in polynomial problems, (Technical reports and reprints of Dept. of Statistics (1984), The Pennsylvania State Univ) · Zbl 0583.62024
[46] Lindsay, B. G.; Roeder, K., Residual diagnostics for mixture models, J. Amer. Statist. Assoc., 87, 785-794 (1992) · Zbl 0850.62337
[47] Lindsay, B. G.; Roeder, K., Uniqueness of estimation and identifiability in mixture models, Canad. J. Statist., 21, 139-147 (1992) · Zbl 0779.62032
[48] Louis, T. A., Finding the observed information matrix when using the EM algorithm, J. Roy. Statist. Soc. Ser. B, 44, 226-233 (1982) · Zbl 0488.62018
[49] MacDonald, P. D.M., MIX: an interactive program for fitting mixtures of distributions, Amer. Statist., 40, 53 (1986)
[50] McLachlan, G. J.; Basford, K. E., Mixture Models and Applications to Clustering (1988), Marcel Dekker: Marcel Dekker New York · Zbl 0697.62050
[51] Meilijson, I., A fast improvement to the EM algorithm on its own terms, J. Roy. Statist. Soc. Ser. B, 51, 127-138 (1989) · Zbl 0674.65118
[52] Redner, R. A., An iterative procedure for obtaining maximum likelihood estimates in a mixture model, (Report SR-T1-04081, NASA Contract NAS9-14689 (1980), Texas A&M Univ: Texas A&M Univ College Station, TX)
[53] Redner, R. A.; Walker, H. F., Mixture densities, maximum likelihood and the EM algorithm, SIAM Rev., 24, 195-239 (1984) · Zbl 0536.62021
[54] Schlattmann, P., Statistische Methoden zur Darstellung der räumlichen Verteilung von Krankheiten unter besonderer Beücksichtigung von Mischverteilungen (1993), Inaugural-Dissertation am Fachbereich Medizinische Grundlagenfächer der Freien Universität Berlin
[55] Schattmann, P.; Böhning, D., Mixture models and disease mapping, Statist. Med., 12, 1943-1950 (1993)
[56] Silvey, S. O., Optimal design (1980), Chapman and Hall: Chapman and Hall London
[57] Silvey, S. D.; Titterington, D. M., A geometric approach to optimal design theory, Biometrika, 60, 21-32 (1973) · Zbl 0252.62045
[58] Silvey, S. D.; Titterington, D. M.; Torsney, B., An algorithm for optimal designs on finite design space, Comm. Statist., A7, 1379-1389 (1978) · Zbl 0389.62061
[59] Simar, L., Maximum likelihood estimation of a compound Poisson process, Ann. Statist., 4, 1200-1209 (1976) · Zbl 0362.62095
[60] Titterington, D. M., Optimal design: some geometrical aspects of D-optimality, Biometrika, 62, 313-320 (1975) · Zbl 0308.62072
[61] Titterington, D. M., Algorithms for computing D-optimal designs on a finite design space, (Proc. 1976 Conf. on Information Sciences and Systems (1976), Dept. of Electrical Engineering, Johns Hopkins Univ), 213-216
[62] Titterington, D. M., Geometric approaches to design of experiment, Math. Operationsforsch. Statist. Ser. Statist., 11, 151-163 (1980) · Zbl 0444.62084
[63] Titterington, D. M.; Smith, A. F.M.; Makov, U. E., Statistical Analysis of Finite Mixture Distributions (1985), Wiley: Wiley New York · Zbl 0646.62013
[64] Torsney, B., A moment inequality and monotonicity of an algorithm, (Contribution to the 2nd Internat. Symp. on Semi-Infinite Programming and Applications (1981), Univ. of Texas at Austin), 8-10 September
[65] Torsney, B., Algorithms for a constrained optimization problem with applications in statistics and optimum design, (Ph.D. Thesis (1981), Univ. of Glasgow) · Zbl 0603.62018
[66] Tsay, J. Y., The iterative methods for calculating optimal experimental designs, (Ph.D. Thesis (1974), Purdue Univ)
[67] Tasy, J. Y., On the sequential construction of D-optimal designs, J. Amer. Statist. Assoc., 71, 671-674 (1976)
[68] Tsay, J. Y., A convergence theorem in L-optimal design theory, Ann. Statist., 4, 790-794 (1977) · Zbl 0355.62067
[69] Wu, C. F., Some algorithmic aspects of the theory of optimal designs, Ann. Statist., 6, 1286-1301 (1978) · Zbl 0392.62058
[70] Wu, C. F., Some iterative procedures for generating non-singular optimal designs, Comm. Statist., A7, 1399-1412 (1978) · Zbl 0399.62076
[71] Wu, C. F., On the convergence properties of the EM algorithm, Ann. Statist., 11, 95-103 (1983) · Zbl 0517.62035
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.