## Conditional formulae for Gibbs-type exchangeable random partitions.(English)Zbl 1287.60046

Let $$(X_{n})_{n\geq 1}$$ be an $${\mathcal X}$$-valued exchangeable sequence, $$\operatorname{P}$$ the random probability on $${\mathcal X}$$ in the de Finetti representation. $$\operatorname{P}$$ is supposed to be concentrated on the set of discrete probabilities and in the representation $$\operatorname{P}=\sum_{i\in I}p_{i}\varepsilon_{Y_{i}}$$, where $$(p_{i})$$ and $$(Y_{i})$$ are independent. For every $$n$$, consider the random partition $$\Pi_{n}$$ of $$\{1,\dots,n\}$$, defined by the exchangeable equivalence relation $$i\sim j$$ if $$X_{i}=X_{j}$$. It is characterized by the probabilities $p_{k}^{(n)}(n_{1},\dots,n_{k}), \text{ where }\sum_{i=1}^{k}n_{i}=n,$ that the number $$M_{i,n}$$ of sets of cardinal $$i$$ in $$\Pi_{n}$$ is $$n_{i}$$; $$k$$ is denoted $$K_{n}$$. If $p_{k}^{(n)}(n_{1},\dots,n_{k})=V_{n,k}\Pi_{i=1}^{k}(1-\sigma )_{n_{i}-1},\, \sigma \in (-\infty ,1),$ where generally $$a_{n}= a(a+1)\cdot \cdot \cdot (a+n-1)$$, and $V_{n,k}=V_{n+1,k+1}+(n-\sigma k) V_{n+1,k},\, k\leq n,\text{ with } V_{1,1}=1,$ is called of Gibbs type. Let $$O_{i,m}^{n}$$ be the number of sets of size $$i$$ in $$\Pi_{n+m}$$ intersecting $$\{1,\dots,n\}$$, $$N_{i,m}^{n}$$ the number of sets of size $$i$$ in $$\Pi_{n+m}$$ not intersecting $$\{1,\dots,n\}$$, $$M_{i,m}^{n}=O_{i,m}^{n}+N_{i,m}^{n}$$.
The authors establish formulas for $$\operatorname{E}((M_{i,n})_{[q]})$$ ($$a_{[q]} =a(a-1)\cdot \cdot \cdot (a-q+1)$$) and for $\operatorname{E}((O_{i,m}^{(n)})_{|q|}), \operatorname{E}((N_{i,m}^{(n)})_{|q|}), \text{ and }\operatorname{E}((M_{i,m}^{(n)})_{|q|})$ being $$\cdot_{i,m}^{n}$$ conditioned on $$(K_{n},M_{1,n},\dots,M_{K_{n},n})$$. The results are applied to three examples: D with $$\sigma =0$$ and $$V_{n,k}=\theta^{k}/\theta_{n}$$, $$\theta >0$$, PD with $\sigma \in (0,1),\,V_{n,k}=\Pi_{i=0}^{k-1}(\theta +i\sigma )/\theta_{n},\, \theta > -\sigma,$ and Gnedin with $\sigma =-1,\, V_{n,k}=\gamma_{n-k}\Pi_{i=1}^{k-1}(i^{2}-\gamma i)\Pi_{i=1}^{n-1}(i^{2}+\gamma i)^{-1},\, \gamma \in [0,1).$ Explicit formulas for the distributions of $$O_{i,m}^{(n)}$$, $$N_{i,m}^{(n)}$$, $$M_{i,m}^{(n)}$$ and for their means are obtained. Convergence in distribution results: For D, $$M_{i,n}\rightarrow \pi_{\theta /i}$$ ($$\pi$$ distributed according to a Poisson distribution), $M_{i,m}^{(n)}, N_{i,m}^{(n)}\rightarrow \pi_{(\theta +n)/i}$ for $$m\rightarrow \infty$$, in PD $N_{i,m}^{(n)}/ m^{\sigma }, M_{i,m}^{(n)}/ m^{\sigma }\rightarrow \sigma (1-\sigma )_{i-1}i!^{-1}B Y,$
$K_{m}^{(n)}/ m^{\sigma }\rightarrow BY, B, Y$ are independent, $B \beta(j+\theta /\sigma ,n/\sigma -j), \,j=K_{n}, Y$ having density $(\Gamma (q\sigma +1) y^{q-1/\sigma -1}f_{\sigma }(y^{-1/\sigma }))/(\sigma \Gamma (q+1))$ where $$q=(\theta +n)/\sigma$$ and $$f_{\sigma }$$ the density of a $$\sigma$$-stable $$\geq 0$$ r.v. In Gnedin $$M_{i,m}^{(n)}, N_{i,m}^{(n)}\rightarrow 0$$. In the paragraph “genomic applications”, the authors study 2586 data, in PD, estimating the parameters to maximize the corresponding $$p_{k}^{(n)}(n_{1},\dots,n_{k})$$. They study $$O_{\tau }^{(n)}= O_{1,m}^{(n)}+\dots+O_{\tau ,m}^{(n)}$$ (the number of new genes appearing at most $$\tau$$ times in the $$m$$ experiments following after $$n$$ ones), $$\tau =3,4,5$$ and similar for $$N$$, $$M$$. They split into $$n=1000$$, $$m=1586$$, compare $$O$$, $$N$$, $$M$$ with the predicted ones (using $$\operatorname{E}$$), then they determine the prediction for $$n=2586$$, $$m= 250,500,750,1000$$.

### MSC:

 60G09 Exchangeability for stochastic processes 60G57 Random measures 62G05 Nonparametric estimation 62F15 Bayesian inference
Full Text:

### References:

  Arratia, R., Barbour, A. D. and Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2 519-535. · Zbl 0756.60006  Arratia, R., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures : A Probabilistic Approach . European Mathematical Society, Zürich. · Zbl 1040.60001  Barbour, A. D. (1992). Refined approximations for the Ewens sampling formula. Random Structures Algorithms 3 267-276. · Zbl 0798.60010  Charalambides, C. A. (2005). Combinatorial Methods in Discrete Distributions . Wiley-Interscience, Hoboken, NJ. · Zbl 1087.60001  Durden, C. and Dong, Q. (2009). RICHEST-A web server for richness estimation in biological data. Bioinformation 3 296-298.  Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biology 3 87-112. · Zbl 0245.92009  Ewens, W. J. and Tavaré, S. (1998). The Ewens sampling formula, Update Vol. 2. In Encyclopedia of Statistical Science (S. Kotz, C. B. Read andD. L. Banks, eds.) 230-234. Wiley, New York.  Favaro, S., Lijoi, A., Mena, R. H. and Prünster, I. (2009). Bayesian non-parametric inference for species variety with a two-parameter Poisson-Dirichlet process prior. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 993-1008.  Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037  Gnedin, A. (2010). A species sampling model with finitely many types. Electron. Commun. Probab. 15 79-88. · Zbl 1202.60056  Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. ( POMI ) 325 83-102. · Zbl 1293.60010  Griffiths, R. C. and Spanò, D. (2007). Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 1101-1130. · Zbl 1148.60002  Ho, M. W., James, L. F. and Lau, J. W. (2007). Gibbs partitions (EPPF’s) derived from a stable subordinator are Fox H and Meijer G transforms. MatharXiv preprint. Available at . 0708.0619v2  James, L. F. (2010). Lamperti-type laws. Ann. Appl. Probab. 20 1303-1340. · Zbl 1204.60024  Kingman, J. F. C. (1978). The representation of partition structures. J. Lond. Math. Soc. (2) 18 374-380. · Zbl 0415.92009  Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235-248. · Zbl 0491.60076  Lijoi, A., Mena, R. H. and Prünster, I. (2007). A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinformatics 8 339. · Zbl 1156.62374  Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769-786. · Zbl 1156.62374  Lijoi, A., Mena, R. H. and Prünster, I. (2007). Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 715-740.  Lijoi, A., Prünster, I. and Walker, S. G. (2008). Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 1519-1547. · Zbl 1142.62333  Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145-158. · Zbl 0821.60047  Pitman, J. (2003). Poisson-Kingman partitions. In Statistics and Science : A Festschrift for Terry Speed (D. R. Goldstein, ed.). Institute of Mathematical Statistics Lecture Notes-Monograph Series 40 1-34. IMS, Beachwood, OH.  Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875 . Springer, Berlin. · Zbl 1103.60004  Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R. and White, J. (2001). The TIGR gene indices: Analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29 159-164.  Schweinsberg, J. (2010). The number of small blocks in exchangeable random partitions. ALEA Lat. Am. J. Probab. Math. Stat. 7 217-242. · Zbl 1276.60011  Valen, E. (2009). Deciphering transcriptional regulation-Computational approaches. Ph.D. thesis, Bioinformatics Centre, Univ. Copenhagen.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.