×

A survey of nonparametric mixing density estimation via the predictive recursion algorithm. (English) Zbl 1469.62228

Summary: Nonparametric estimation of a mixing density based on observations from the corresponding mixture is a challenging statistical problem. This paper surveys the literature on a fast, recursive estimator based on the predictive recursion algorithm. After introducing the algorithm and giving a few examples, I summarize the available asymptotic convergence theory, describe an important semiparametric extension, and highlight two interesting applications. I conclude with a discussion of several recent developments in this area and some open problems.

MSC:

62G07 Density estimation
62C12 Empirical decision procedures; empirical Bayes procedures

Software:

mixfdr
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bogdan, M., Ghosh, J.K. and Tokdar, S.T. (2008). A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. IMS, Beachwood, Balakrishnan, N., Peña, E. and Silvapulle, M. (eds.), p. 211-230.
[2] Bogdan, M.; Chakrabarti, A.; Frommlet, F.; Ghosh, JK, Asymptotic Bayes-optimality under sparsity of some multiple testing procedures, Ann. Statist., 39, 1551-1579 (2011) · Zbl 1221.62012
[3] Böhning, D., Computer-assisted Analysis of Mixtures and Applications: Meta-analysis, Disease Mapping and Others (2000), Boca Raton: Chapman and Hall-CRC, Boca Raton · Zbl 0951.62088
[4] Brown, L., In-season prediction of batting averages: a field test of empirical Bayes and Bayes methodologies, Ann. Appl. Stat., 2, 113-152 (2008) · Zbl 1137.62419
[5] Chae, M.; Martin, R.; Walker, SG, Convergence of an iterative algorithm to the nonparametric MLE of a mixing distribution, Statist. Probab. Lett., 140, 142-146 (2018) · Zbl 1392.62089
[6] Chae, M.; Martin, R.; Walker, SG, On an algorithm for solving Fredholm integrals of the first kind, Stat. Comput., 29, 645-654 (2019) · Zbl 1430.62127
[7] Datta, J.; Ghosh, JK, Asymptotic properties of Bayes risk for the horseshoe prior, Bayesian Anal., 8, 111-131 (2013) · Zbl 1329.62122
[8] Dempster, A.; Laird, N.; Rubin, D., Maximum-likelihood from incomplete data via the EM algorithm (with discussion), J. Roy. Statist. Soc. Ser. B, 39, 1-38 (1977) · Zbl 0364.62022
[9] Dixit, V. and Martin, R. (2019). Permutation-based uncertainty quantification about a mixing distribution. Unpublished manuscript, arXiv:http://arXiv.org/abs/1906.05349.
[10] Dutta, R.; Bogdan, M.; Ghosh, JK, Model selection and multiple testing—a Bayes and empirical Bayes overview and some new results, J. Indian Statist. Assoc., 50, 105-142 (2012) · Zbl 1462.62176
[11] Efron, B., Robbins, empirical Bayes and microarrays, Ann. Statist., 31, 366-378 (2003) · Zbl 1038.62099
[12] Efron, B., Large-scale simultaneous hypothesis testing: the choice of a null hypothesis, J. Amer. Statist. Assoc., 99, 96-104 (2004) · Zbl 1089.62502
[13] Efron, B., Microarrays, empirical Bayes and the two-groups model, Statist. Sci., 23, 1-22 (2008) · Zbl 1327.62046
[14] Efron, B., Large-Scale Inference Volume 1 of Institute of Mathematical Statistics Monographs (2010), Cambridge: Cambridge University Press, Cambridge · Zbl 1277.62016
[15] Eggermont, PPB; LaRiccia, VN, Maximum smoothed likelihood density estimation for inverse problems, Ann. Statist., 23, 199-220 (1995) · Zbl 0822.62025
[16] Escobar, MD, Estimating normal means with a Dirichlet process prior, J. Amer. Statist. Assoc., 89, 268-277 (1994) · Zbl 0791.62039
[17] Escobar, MD; West, M., Bayesian density estimation and inference using mixtures, J. Amer. Statist. Assoc., 90, 577-588 (1995) · Zbl 0826.62021
[18] Fan, J., On the optimal rates of convergence for nonparametric deconvolution problems, Ann. Statist., 19, 1257-1272 (1991) · Zbl 0729.62033
[19] Ghosal, S. (2010). The Dirichlet process, related priors and posterior asymptotics. Cambridge Univ. Press, Cambridge, p. 35-79.
[20] Ghosal, S. and Roy, A. (2009). Bayesian nonparametric approach to multiple testing. World Scientific Press, Singapore, Sastry, N. S. N., Rao, T. S. S. R. K., Delampady, M. and Rajeev, B. (eds.), p. 139-164.
[21] Ghosal, S.; van der Vaart, AW, Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities, Ann. Statist., 29, 1233-1263 (2001) · Zbl 1043.62025
[22] Ghosal, S.; van der Vaart, A., Fundamentals of Nonparametric Bayesian Inference Volume 44 of Cambridge Series in Statistical and Probabilistic Mathematics (2017), Cambridge: Cambridge University Press, Cambridge · Zbl 1376.62004
[23] Ghosh, JK; Ramamoorthi, RV, Bayesian Nonparametrics (2003), New York: Springer, New York · Zbl 1029.62004
[24] Ghosh, J.K. and Tokdar, S.T. (2006). Convergence and consistency of Newton’s algorithm for estimating mixing distribution. Imp. Coll. Press, London, Fan, J. and Koul, H. (eds.), p. 429-443. · Zbl 1119.62020
[25] Ghosal, S.; Ghosh, JK; Ramamoorthi, RV, Posterior consistency of Dirichlet mixtures in density estimation, Ann. Statist., 27, 143-158 (1999) · Zbl 0932.62043
[26] Ghosh, JK; Delampady, M.; Samanta, T., An Introduction to Bayesian Analysis (2006), New York: Springer, New York · Zbl 1135.62002
[27] Hahn, PR; Martin, R.; Walker, SG, On recursive Bayesian predictive distributions, J. Amer. Statist. Assoc., 113, 1085-1093 (2018) · Zbl 1402.62062
[28] Jeng, X.J., Zhang, T. and Tzeng, J.-Y. (2018). Efficient signal inclusion with genomic applications. J. Amer. Statist. Assoc., to appear; arXiv:http://arXiv.org/abs/1805.10570. · Zbl 1428.62249
[29] Jiang, W.; Zhang, C-H, General maximum likelihood empirical Bayes estimation of normal means, Ann. Statist., 37, 1647-1684 (2009) · Zbl 1168.62005
[30] Jin, J.; Cai, TT, Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons, J. Amer. Statist. Assoc., 102, 495-506 (2007) · Zbl 1172.62319
[31] Jin, J.; Peng, J.; Wang, P., A generalized Fourier approach to estimating the null parameters and proportion of nonnull effects in large-scale multiple testing, J. Statist. Res., 44, 103-127 (2010)
[32] Kleijn, BJK; van der Vaart, AW, Misspecification in infinite-dimensional Bayesian statistics, Ann. Statist., 34, 837-877 (2006) · Zbl 1095.62031
[33] Kutner, M.I., Nachtsheim, C.J., Neter, J. and Li, W. (2005). Applied Linear Statistical Models, 5th edn. McGraw-Hill/Irwin.
[34] Laird, N., Nonparametric maximum likelihood estimation of a mixed distribution, J. Amer. Statist. Assoc., 73, 805-811 (1978) · Zbl 0391.62029
[35] Leroux, BG, Consistent estimation of a mixing distribution, Ann. Statist., 20, 1350-1360 (1992) · Zbl 0763.62015
[36] Lindsay, BG, Mixture Models, Theory, Geometry and Applications (1995), IMS: Haywood, IMS · Zbl 1163.62326
[37] Liu, L.; Levine, M.; Zhu, Y., A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization, J. Comput. Graph. Statist., 18, 481-504 (2009)
[38] Lo, AY, On a class of Bayesian nonparametric estimates. I. Density estimates, Ann. Statist., 12, 351-357 (1984) · Zbl 0557.62036
[39] MacEachern, SN, Estimating normal means with a conjugate style Dirichlet process prior, Comm. Statist. Simulation Comput., 23, 727-741 (1994) · Zbl 0825.62053
[40] MacEachern, S.N. (1998). Computational methods for mixture of Dirichlet process models. Springer, New York, Dey, D., Müller, P. and Sinha, D. (eds.), p. 23-43. · Zbl 0918.62064
[41] MacEachern, S.; Müller, P., Estimating mixture of Dirichlet process models, J. Comput. Graph. Statist., 7, 223-238 (1998)
[42] Madrid-Padilla, O-H; Polson, NG; Scott, J., A deconvolution path for mixtures, Electron. J. Stat., 12, 1717-1751 (2018) · Zbl 1404.62033
[43] Martin, R. (2009). Fast Nonparametric Estimation of a Mixing Distribution with Application to High-Dimensional Inference. PhD thesis, Purdue University Department of Statistics, West Lafayette, IN.
[44] Martin, R., Convergence rate for predictive recursion estimation of finite mixtures, Statist. Probab. Lett., 82, 378-384 (2012) · Zbl 1237.62044
[45] Martin, R.; Ghosh, JK, Stochastic approximation and Newton’s estimate of a mixing distribution, Statist. Sci., 23, 365-382 (2008) · Zbl 1329.62361
[46] Martin, R.; Han, Z., A semiparametric scale-mixture regression model and predictive recursion maximum likelihood, Comput. Statist. Data Anal., 94, 75-85 (2016) · Zbl 1468.62136
[47] Martin, R.; Tokdar, ST, Asymptotic properties of predictive recursion: robustness and rate of convergence, Electron. J. Stat., 3, 1455-1472 (2009) · Zbl 1326.62107
[48] Martin, R.; Tokdar, ST, Semiparametric inference in mixture models with predictive recursion marginal likelihood, Biometrika, 98, 567-582 (2011) · Zbl 1231.62056
[49] Martin, R.; Tokdar, ST, A nonparametric empirical Bayes framework for large-scale multiple testing, Biostatistics, 13, 427-439 (2012) · Zbl 1244.62066
[50] Müller, P.; Quintana, FA, Nonparametric Bayesian data analysis, Statist. Sci., 19, 95-110 (2004) · Zbl 1057.62032
[51] Muralidharan, O., An empirical Bayes mixture method for effect size and false discovery rate estimation, Ann. Appl. Statist., 4, 422-438 (2010) · Zbl 1189.62004
[52] Neal, RM, Markov chain sampling methods for Dirichlet process mixture models, J. Comput. Graph. Statist., 9, 249-265 (2000)
[53] Newton, MA, On a nonparametric recursive estimator of the mixing distribution, Sankhyā Ser. A, 64, 306-322 (2002) · Zbl 1192.62110
[54] Newton, MA; Zhang, Y., A recursive algorithm for nonparametric analysis with missing data, Biometrika, 86, 15-26 (1999) · Zbl 0917.62045
[55] Newton, M.A., Quintana, F.A. and Zhang, Y. (1998). Nonparametric Bayes methods using predictive updating. Springer, New York, Dey, D., Müller, P. and Sinha, D. (eds.), p. 45-61. · Zbl 0918.62030
[56] Newton, M.; Kendziorski, C.; Richmond, C.; Blattner, F.; Tsui, K., On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data, J. Comput. Biology, 8, 37-52 (2001)
[57] Pastpipatkul, P., Yamaka, W. and Sriboonchitta, S. (2017). Predictive recursion maximum likelihood of threshold autoregressive model. Springer, Kreinovich, V., Sriboonchitta, S. and Huynh, V. -N. (eds.), p. 349-362.
[58] Richardson, S.; Green, PJ, On Bayesian analysis of mixtures with an unknown number of components, J. Roy. Statist. Soc. Ser. B, 59, 731-792 (1997) · Zbl 0891.62020
[59] Robbins, H. (1956). An empirical Bayes approach to statistics, I. University of California Press, Berkeley, p. 157-163. · Zbl 0074.35302
[60] Robbins, H., The empirical Bayes approach to statistical decision problems, Ann. Math. Statist., 35, 1-20 (1964) · Zbl 0138.12304
[61] Robbins, H., Some thoughts on empirical Bayes estimation, Ann. Statist., 11, 713-723 (1983) · Zbl 0522.62024
[62] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Statistics, 22, 400-407 (1951) · Zbl 0054.05901
[63] Roeder, K., Density estimation with confidence sets exemplified by superclusters and voids in the galaxies, J. Amer. Statist. Assoc., 411, 617-624 (1990) · Zbl 0704.62103
[64] San Martin, E.; Quintana, F., Consistency and identifiability revisited, Braz. J. Probab. Stat., 16, 99-106 (2002) · Zbl 1049.62003
[65] Scott, JG; Kelly, RC; Smith, MA; Zhou, P.; Kass, RE, False discovery rate regression: An application to neural synchrony detection in primary visual cortex, J. Amer. Statist. Assoc., 110, 459-471 (2015)
[66] Stefanski, L.; Carroll, RJ, Deconvoluting kernel density estimators, Statistics, 21, 169-184 (1990) · Zbl 0697.62035
[67] Tansey, W.; Oluwasanmi, K.; Poldrack, RA; Scott, JG, False discovery rate smoothing, J. Amer. Statist. Assoc., 113, 1156-1171 (2018) · Zbl 1402.62011
[68] Tao, H.; Palta, M.; Yandell, BS; Newton, MA, An estimation method for the semiparametric mixed effects model, Biometrics, 55, 102-110 (1999) · Zbl 1059.62572
[69] Teicher, H., Identifiability of mixtures, Ann. Math. Statist., 32, 244-248 (1961) · Zbl 0146.39302
[70] Teicher, H., Identifiability of finite mixtures, Ann. Math. Statist., 34, 1265-1269 (1963) · Zbl 0137.12704
[71] Todem, D.; Williams, KP, A hierarchical model for binary data with dependence between the design and outcome success probabilities, Stat. Med., 28, 2967-2988 (2009)
[72] Tokdar, ST; Martin, R.; Ghosh, JK, Consistency of a recursive estimate of mixing distributions, Ann. Statist., 37, 2502-2522 (2009) · Zbl 1173.62020
[73] van Dyk, DA; Meng, X-L, The art of data augmentation, J. Comput. Graph. Statist., 10, 1, 1-111 (2001)
[74] van’t Wout, A.; Lehrma, G.; Mikheeva, S.; O’Keefe, G.; Katze, M.; Bumgarner, R.; Geiss, G.; Mullins, J., Cellular gene expression upon human immunodeficiency virus type 1 injection of cd\(+T-Cell lines, J. Virol., 77, 1392-1402 (2003\)
[75] Wang, Y., On fast computation of the non-parametric maximum likelihood estimate of a mixing distribution, J. R. Stat. Soc. Ser. B, 69, 185-198 (2007) · Zbl 1120.62022
[76] Woody, S. and Scott, J.G. (2018). Optimal post-selection inference for sparse signals: a nonparametric empirical-Bayes approach. Unpublished manuscript, arXiv:http://arXiv.org/abs/1810.11042http://arXiv.org/abs/1810.11042.
[77] Zhang, C-H, Fourier methods for estimating mixing densities and distributions, Ann. Statist., 18, 806-831 (1990) · Zbl 0778.62037
[78] Zhang, C-H, On estimating mixing densities in discrete exponential family models, Ann. Statist., 23, 929-945 (1995) · Zbl 0841.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.