×

zbMATH — the first resource for mathematics

Bayesian nonparametric estimators derived from conditional Gibbs structures. (English) Zbl 1142.62333
Summary: We consider discrete nonparametric priors which induce Gibbs-type exchangeable random partitions and investigate their posterior behavior in detail. In particular, we deduce conditional distributions and the corresponding Bayesian nonparametric estimators, which can be readily exploited for predicting various features of additional samples. The results provide useful tools for genomic applications where prediction of future outcomes is required.

MSC:
62G05 Nonparametric estimation
62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
60G57 Random measures
PDF BibTeX XML Cite
Full Text: DOI arXiv
References:
[1] Adams, M., Kelley, J., Gocayne, J., Mark, D., Polymeropoulos, M., Xiao, H., Merril, C., Wu, A., Olde, B., Moreno, R., Kerlavage, A., McCombe, W. and Venter, J. (1991). Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252 1651-1656.
[2] Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 1152-1174. · Zbl 0335.60034
[3] Arratia, R., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures : A Probabilistic Approach . EMS, Zürich. · Zbl 1040.60001
[4] Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Adv. in Appl. Probab. 31 929-953. · Zbl 0957.60055
[5] Charalambides, C. A. (2005). Combinatorial Methods in Discrete Distributions . Wiley, Hoboken, NJ. · Zbl 1087.60001
[6] Charalambides, C. A. and Singh, J. (1988). A review of the Stirling numbers, their generalizations and statistical applications. Commun. Statist. Theory Methods 17 2533-2595. · Zbl 0696.62025
[7] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3 87-112. · Zbl 0245.92009
[8] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037
[9] Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. POMI 325 83-102, 244-245. · Zbl 1293.60010
[10] Griffiths, R. C. and Lessard, S. (2005). Ewens’ sampling formula and related formulae: combinatorial proofs, extensions to variable population size and applications to ages of alleles. Theor. Popul. Biol. 68 167-177. · Zbl 1085.92027
[11] Griffiths, R. C. and Spanò, D. (2007). Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 1101-1130. · Zbl 1148.60002
[12] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161-173. JSTOR: · Zbl 1014.62006
[13] Ishwaran, H. and James, L. F. (2003). Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sinica 13 1211-1235. · Zbl 1086.62036
[14] James, L. F. (2002). Poisson process partition calculus with applications to exchangeable models and Bayesian nonparametrics. Manuscript. Available at http://arxiv.org/pdf/math.PR/0205093.
[15] James, L. F., Lijoi, A. and Prünster, I. (2006). Conjugacy as a distinctive feature of the Dirichlet process. Scand. J. Statist. 33 105-120. · Zbl 1121.62028
[16] James, L. F., Lijoi, A. and Prünster, I. (2008). Posterior analysis for normalized random measures with independent increments. Scand. J. Statist. · Zbl 1190.62052
[17] Kingman, J. F. C. (1975). Random discrete distributions (with discussion). J. Roy. Statist. Soc. Ser. B 37 1-22. JSTOR: · Zbl 0331.62019
[18] Kerov, S. (1995). Coherent random allocations and the Ewens-Pitman sampling formula. PDMI Preprint, Steklov Math. Institute, St. Petersburg. · Zbl 0856.05008
[19] Lijoi, A., Mena, R. H. and Prünster, I. (2006). Controlling the reinforcement in Bayesian mixture models. J. Roy. Statist. Soc. Ser. B 69 715-740.
[20] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769-786. · Zbl 1156.62374
[21] Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates. I. Density estimates. Ann. Statist. 12 351-357. · Zbl 0557.62036
[22] Lo, A. Y. and Weng, C.-S. (1989). On a class of Bayesian nonparametric estimates. II. Hazard rate estimates. Ann. Inst. Statist. Math. 41 227-245. · Zbl 0716.62043
[23] Mao, C. X. (2004). Prediction of the conditional probability of discovering a new class. J. Amer. Statist. Assoc. 99 1108-1118. · Zbl 1055.62007
[24] Mao, C. X. and Lindsay, B. G. (2002). A Poisson model for the coverage problem with a genomic application. Biometrika 89 669-682. JSTOR: · Zbl 1037.62115
[25] Mao, C. X. (2007). Estimating species accumulation curves and diversity indices. Statist. Sinica 17 761-775. · Zbl 1144.62108
[26] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145-158. · Zbl 0821.60047
[27] Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. Statistics , Probability and Game Theory. Papers in Honor of David Blackwell (T. S. Ferguson et al., eds.). Lecture Notes Monograph Series 30 245-267. IMS, Hayward, CA.
[28] Pitman, J. (2003). Poisson-Kingman partitions. Science and Statistics : A Festschrift for Terry Speed (D. R. Goldstein, ed.). Lecture Notes Monograph Series 40 1-34. IMS, Beachwood, OH.
[29] Pitman, J. (2006). Combinatorial Stochastic Processes . Springer, Berlin. · Zbl 1103.60004
[30] Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R. and White, J. (2000). The TIGR gene indices: Analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29 159-164.
[31] Regazzini, E., Lijoi, A. and Prünster, I. (2003). Distributional results for means of random measures with independent increments. Ann. Statist. 31 560-585. · Zbl 1068.62034
[32] Susko, E. and Roger, A. J. (2004). Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. Bioinformatics 20 2279-2287.
[33] Tavaré, E. and Ewens, W. J. (1998). The Ewens sampling formula. In Encyclopedia of Statistical Science (S. Kotz, C. B. Read and D. L. Banks, eds.) 2 update 230-234. Wiley, New York.
[34] Teh, Y. W. (2006) A hierarchical Bayesian language model based on Pitman-Yor processes. Coling ACL Proceedings 44 985-992.
[35] Teh, Y. W, Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566-1581. · Zbl 1171.62349
[36] Zabell, S. L. (1982). W. E. Johnson’s “sufficientness” postulate. Ann. Statist. 10 1090-1099. · Zbl 0512.62007
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.