zbMATH — the first resource for mathematics

Nonparametric estimation of component distributions in a multivariate mixture. (English) Zbl 1018.62021
Summary: Suppose \(k\)-variate data are drawn from a mixture of two distributions, each having independent components. It is desired to estimate the univariate marginal distributions in each of the products, as well as the mixing proportion. This is the setting of two-class, fully parametrized latent models that has been proposed for estimating the distributions of medical test results when disease status is unavailable. The problem is one of inference in a mixture of distributions without training data, and until now it has been tackled only in a fully parametric setting.
We investigate the possibility of using nonparametric methods. Of course, when \(k=1\) the problem is not identifiable from a nonparametric viewpoint. We show that the problem is “almost” identifiable when \(k=2\); there, the set of all possible representations can be expressed, in terms of any one of those representations, as a two-parameter family. Furthermore, it is proved that when \(k\geq 3\) the problem is nonparametrically identifiable under particularly mild regularity conditions. In this case we introduce root-\(n\) consistent nonparametric estimators of the \(2k\) univariate marginal distributions and the mixing proportion. Finite-sample and asymptotic properties of the estimators are described.

62G05 Nonparametric estimation
62P10 Applications of statistics to biology and medical sciences; meta analysis
62G07 Density estimation
62H12 Estimation in multivariate analysis
Full Text: DOI
[1] BARBE, P. and BERTAIL, P. (1995). The Weighted Bootstrap. Springer, Berlin. · Zbl 0826.62030
[2] CERRITO, P. B. (1992). Using stratification to estimate multimodal density functions with applications to regression. Comm. Statist. Simulation Comput. 21 1149-1164. · Zbl 0775.62090 · doi:10.1080/03610919208813069
[3] COHEN, A. C. (1967). Estimation in mixtures of two normal distributions. Technometrics 9 15-28. JSTOR: · Zbl 0147.18104 · doi:10.2307/1266315 · links.jstor.org
[4] DAY, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika 56 463-474. JSTOR: · Zbl 0183.48106 · doi:10.1093/biomet/56.3.463 · links.jstor.org
[5] EFRON, B. (1981). Nonparametric standard errors and confidence intervals (with discussion). Canad. J. Statist. 9 139-172. JSTOR: · Zbl 0482.62034 · doi:10.2307/3314608 · links.jstor.org
[6] EVERITT, B. S. and HAND, D. J. (1981). Finite Mixture Distributions. Chapman and Hall, London. · Zbl 0466.62018
[7] HADGU, A. and QU, Y. (1998). A biomedical application of latent models with random effects. Appl. Statist. 47 603-616. · Zbl 0913.62105 · doi:10.1111/1467-9876.00131
[8] HALL, P. (1981). On the nonparametric estimation of mixture proportions. J. Roy. Statist. Soc. Ser. B 43 147-156. JSTOR: · Zbl 0472.62052 · links.jstor.org
[9] HALL, P. and PRESNELL, B. (1999). Intentionally biased bootstrap methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 143-158. JSTOR: · Zbl 0931.62036 · doi:10.1111/1467-9868.00168 · links.jstor.org
[10] HALL, P. and TITTERINGTON, D. M. (1984). Efficient nonparametric estimation of mixture proportions. J. Roy. Statist. Soc. Ser. B 46 465-473. JSTOR: · Zbl 0586.62050 · links.jstor.org
[11] HALL, P. and TITTERINGTON, D. M. (1985). The use of uncategorized data to improve the performance of a nonparametric estimator of a mixture density. J. Roy. Statist. Soc. Ser. B 47 155-163. JSTOR: · Zbl 0576.62054 · links.jstor.org
[12] HUI, S. L. and ZHOU, X. H. (1998). Evaluation of diagnostic tests without gold standards. Statist. Methods Medical Res. 7 354-370.
[13] LAIRD, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73 805-811. · Zbl 0391.62029 · doi:10.2307/2286284
[14] LANCASTER, T. and IMBENS, G. (1996). Case-control studies with contaminated controls. J. Econometrics 71 145-160. · Zbl 0842.62096 · doi:10.1016/0304-4076(94)01698-4
[15] LINDSAY, B. G. (1983a). The geometry of mixture likelihoods: A general theory. Ann. Statist. 11 86-94. · Zbl 0512.62005 · doi:10.1214/aos/1176346059
[16] LINDSAY, B. G. (1983b). The geometry of mixture likelihoods. II. The exponential family. Ann. Statist. 11 783-792. · Zbl 0534.62002 · doi:10.1214/aos/1176346245
[17] LINDSAY, B. G. and BASAK, P. (1993). Multivariate normal mixtures: A fast consistent method of moments. J. Amer. Statist. Assoc. 88 468-476. JSTOR: · Zbl 0773.62037 · doi:10.2307/2290326 · links.jstor.org
[18] MCLACHLAN, G. J. and BASFORD, K. E. (1988). Mixture Models. Inference and Applications to Clustering. Dekker, New York. · Zbl 0707.62214 · doi:10.2307/2531869
[19] METZ, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine 8 283-298.
[20] MURRAY, G. D. and TITTERINGTON, D. M. (1978). Estimation problems with data from a mixture. Appl. Statist. 27 325-334. · Zbl 0437.62036 · doi:10.2307/2347169
[21] O’NEILL, T. J. (1978). Normal discrimination with unclassified observations. J. Amer. Statist. Assoc. 73 821-826. · Zbl 0409.62047 · doi:10.2307/2286287
[22] QIN, J. (1998). Semiparametric likelihood based method for goodness of fit tests and estimation in upgraded mixture models. Scand. J. Statist. 25 681-691. · Zbl 0932.62059 · doi:10.1111/1467-9469.00129
[23] QIN, J. (1999). Empirical likelihood ratio based confidence intervals for mixture proportions. Ann. Statist. 27 1368-1384. · Zbl 0960.62048 · doi:10.1214/aos/1017938930
[24] QU, Y. and HADGU, A. (1998). A model for evaluating sensitivity and specificity for correlated diagnostic tests in efficacy studies with an imperfect reference test. J. Amer. Statist. Assoc. 93 920-928.
[25] QU, Y., TAN, M. and KUTNER, M. H. (1996). Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52 797-810. JSTOR: · Zbl 0875.62551 · doi:10.2307/2533043 · links.jstor.org
[26] QUANDT, R. E. and RAMSEY, J. B. (1978). Estimating mixtures of normal distributions and switching regressions. J. Amer. Statist. Assoc. 73 730-738. · Zbl 0401.62024 · doi:10.2307/2286266
[27] REDNER, R. A. and WALKER, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195-239. JSTOR: · Zbl 0536.62021 · doi:10.1137/1026034 · links.jstor.org
[28] RINDSKOPF, D. and RINDSKOPF, W. (1986). The value of latent class analysis in medical diagnosis. Statist. Medicine 5 21-27.
[29] SHAHSHAHANI, B. M. and LANDGREBE, D. A. (1994). The effect of unlabeled samples in reducing the small sample-size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sensing 32 1087-1095.
[30] TEICHER, H. (1967). Identifiability of mixtures of product measures. Ann. Math. Statist. 38 1300- 1302. · Zbl 0153.47904 · doi:10.1214/aoms/1177698805
[31] THOMPSON, W. D. and WALTER, S. D. (1988). A reappraisal of the kappa coefficient. J. Clinical Epidemiol. 41 949-958.
[32] TITTERINGTON, D. M. (1983). Minimum-distance non-parametric estimation of mixture proportions. J. Roy. Statist. Soc. Ser. B 45 37-46. JSTOR: · Zbl 0563.62027 · links.jstor.org
[33] TITTERINGTON, D. M., SMITH, A. F. M. and MAKOV, U. E. (1985). Statistical Analy sis of Finite Mixture Distributions. Wiley, Chichester. · Zbl 0646.62013
[34] TORRANCE-Ry NARD, V. L. and WALTER, S. D. (1988). Effects of dependent errors in the assessment of diagnostic test performance. Statist. Medicine 16 2157-2175.
[35] VALENSTEIN, P. N. (1990). Evaluating diagnostic tests with imperfect standards. Amer. J. Clinical Pathology 93 252-258.
[36] WALTER, S. D. and IRWIG, L. M. (1988). Estimation of test error rates, disease prevalence and relative risk from misclassified data: A review. Journal of Clinical Epidemiology 41 923-937.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.