## Conditional formulae for Gibbs-type exchangeable random partitions.(English)Zbl 1287.60046

Let $$(X_{n})_{n\geq 1}$$ be an $${\mathcal X}$$-valued exchangeable sequence, $$\operatorname{P}$$ the random probability on $${\mathcal X}$$ in the de Finetti representation. $$\operatorname{P}$$ is supposed to be concentrated on the set of discrete probabilities and in the representation $$\operatorname{P}=\sum_{i\in I}p_{i}\varepsilon_{Y_{i}}$$, where $$(p_{i})$$ and $$(Y_{i})$$ are independent. For every $$n$$, consider the random partition $$\Pi_{n}$$ of $$\{1,\dots,n\}$$, defined by the exchangeable equivalence relation $$i\sim j$$ if $$X_{i}=X_{j}$$. It is characterized by the probabilities $p_{k}^{(n)}(n_{1},\dots,n_{k}), \text{ where }\sum_{i=1}^{k}n_{i}=n,$ that the number $$M_{i,n}$$ of sets of cardinal $$i$$ in $$\Pi_{n}$$ is $$n_{i}$$; $$k$$ is denoted $$K_{n}$$. If $p_{k}^{(n)}(n_{1},\dots,n_{k})=V_{n,k}\Pi_{i=1}^{k}(1-\sigma )_{n_{i}-1},\, \sigma \in (-\infty ,1),$ where generally $$a_{n}= a(a+1)\cdot \cdot \cdot (a+n-1)$$, and $V_{n,k}=V_{n+1,k+1}+(n-\sigma k) V_{n+1,k},\, k\leq n,\text{ with } V_{1,1}=1,$ is called of Gibbs type. Let $$O_{i,m}^{n}$$ be the number of sets of size $$i$$ in $$\Pi_{n+m}$$ intersecting $$\{1,\dots,n\}$$, $$N_{i,m}^{n}$$ the number of sets of size $$i$$ in $$\Pi_{n+m}$$ not intersecting $$\{1,\dots,n\}$$, $$M_{i,m}^{n}=O_{i,m}^{n}+N_{i,m}^{n}$$.
The authors establish formulas for $$\operatorname{E}((M_{i,n})_{[q]})$$ ($$a_{[q]} =a(a-1)\cdot \cdot \cdot (a-q+1)$$) and for $\operatorname{E}((O_{i,m}^{(n)})_{|q|}), \operatorname{E}((N_{i,m}^{(n)})_{|q|}), \text{ and }\operatorname{E}((M_{i,m}^{(n)})_{|q|})$ being $$\cdot_{i,m}^{n}$$ conditioned on $$(K_{n},M_{1,n},\dots,M_{K_{n},n})$$. The results are applied to three examples: D with $$\sigma =0$$ and $$V_{n,k}=\theta^{k}/\theta_{n}$$, $$\theta >0$$, PD with $\sigma \in (0,1),\,V_{n,k}=\Pi_{i=0}^{k-1}(\theta +i\sigma )/\theta_{n},\, \theta > -\sigma,$ and Gnedin with $\sigma =-1,\, V_{n,k}=\gamma_{n-k}\Pi_{i=1}^{k-1}(i^{2}-\gamma i)\Pi_{i=1}^{n-1}(i^{2}+\gamma i)^{-1},\, \gamma \in [0,1).$ Explicit formulas for the distributions of $$O_{i,m}^{(n)}$$, $$N_{i,m}^{(n)}$$, $$M_{i,m}^{(n)}$$ and for their means are obtained. Convergence in distribution results: For D, $$M_{i,n}\rightarrow \pi_{\theta /i}$$ ($$\pi$$ distributed according to a Poisson distribution), $M_{i,m}^{(n)}, N_{i,m}^{(n)}\rightarrow \pi_{(\theta +n)/i}$ for $$m\rightarrow \infty$$, in PD $N_{i,m}^{(n)}/ m^{\sigma }, M_{i,m}^{(n)}/ m^{\sigma }\rightarrow \sigma (1-\sigma )_{i-1}i!^{-1}B Y,$
$K_{m}^{(n)}/ m^{\sigma }\rightarrow BY, B, Y$ are independent, $B \beta(j+\theta /\sigma ,n/\sigma -j), \,j=K_{n}, Y$ having density $(\Gamma (q\sigma +1) y^{q-1/\sigma -1}f_{\sigma }(y^{-1/\sigma }))/(\sigma \Gamma (q+1))$ where $$q=(\theta +n)/\sigma$$ and $$f_{\sigma }$$ the density of a $$\sigma$$-stable $$\geq 0$$ r.v. In Gnedin $$M_{i,m}^{(n)}, N_{i,m}^{(n)}\rightarrow 0$$. In the paragraph “genomic applications”, the authors study 2586 data, in PD, estimating the parameters to maximize the corresponding $$p_{k}^{(n)}(n_{1},\dots,n_{k})$$. They study $$O_{\tau }^{(n)}= O_{1,m}^{(n)}+\dots+O_{\tau ,m}^{(n)}$$ (the number of new genes appearing at most $$\tau$$ times in the $$m$$ experiments following after $$n$$ ones), $$\tau =3,4,5$$ and similar for $$N$$, $$M$$. They split into $$n=1000$$, $$m=1586$$, compare $$O$$, $$N$$, $$M$$ with the predicted ones (using $$\operatorname{E}$$), then they determine the prediction for $$n=2586$$, $$m= 250,500,750,1000$$.

### MSC:

 60G09 Exchangeability for stochastic processes 60G57 Random measures 62G05 Nonparametric estimation 62F15 Bayesian inference
Full Text:

### References:

 [1] Arratia, R., Barbour, A. D. and Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2 519-535. · Zbl 0756.60006 [2] Arratia, R., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures : A Probabilistic Approach . European Mathematical Society, Zürich. · Zbl 1040.60001 [3] Barbour, A. D. (1992). Refined approximations for the Ewens sampling formula. Random Structures Algorithms 3 267-276. · Zbl 0798.60010 [4] Charalambides, C. A. (2005). Combinatorial Methods in Discrete Distributions . Wiley-Interscience, Hoboken, NJ. · Zbl 1087.60001 [5] Durden, C. and Dong, Q. (2009). RICHEST-A web server for richness estimation in biological data. Bioinformation 3 296-298. [6] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biology 3 87-112. · Zbl 0245.92009 [7] Ewens, W. J. and Tavaré, S. (1998). The Ewens sampling formula, Update Vol. 2. In Encyclopedia of Statistical Science (S. Kotz, C. B. Read andD. L. Banks, eds.) 230-234. Wiley, New York. [8] Favaro, S., Lijoi, A., Mena, R. H. and Prünster, I. (2009). Bayesian non-parametric inference for species variety with a two-parameter Poisson-Dirichlet process prior. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 993-1008. [9] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037 [10] Gnedin, A. (2010). A species sampling model with finitely many types. Electron. Commun. Probab. 15 79-88. · Zbl 1202.60056 [11] Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. ( POMI ) 325 83-102. · Zbl 1293.60010 [12] Griffiths, R. C. and Spanò, D. (2007). Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 1101-1130. · Zbl 1148.60002 [13] Ho, M. W., James, L. F. and Lau, J. W. (2007). Gibbs partitions (EPPF’s) derived from a stable subordinator are Fox H and Meijer G transforms. MatharXiv preprint. Available at . 0708.0619v2 [14] James, L. F. (2010). Lamperti-type laws. Ann. Appl. Probab. 20 1303-1340. · Zbl 1204.60024 [15] Kingman, J. F. C. (1978). The representation of partition structures. J. Lond. Math. Soc. (2) 18 374-380. · Zbl 0415.92009 [16] Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235-248. · Zbl 0491.60076 [17] Lijoi, A., Mena, R. H. and Prünster, I. (2007). A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinformatics 8 339. · Zbl 1156.62374 [18] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769-786. · Zbl 1156.62374 [19] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 715-740. [20] Lijoi, A., Prünster, I. and Walker, S. G. (2008). Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 1519-1547. · Zbl 1142.62333 [21] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145-158. · Zbl 0821.60047 [22] Pitman, J. (2003). Poisson-Kingman partitions. In Statistics and Science : A Festschrift for Terry Speed (D. R. Goldstein, ed.). Institute of Mathematical Statistics Lecture Notes-Monograph Series 40 1-34. IMS, Beachwood, OH. [23] Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875 . Springer, Berlin. · Zbl 1103.60004 [24] Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R. and White, J. (2001). The TIGR gene indices: Analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29 159-164. [25] Schweinsberg, J. (2010). The number of small blocks in exchangeable random partitions. ALEA Lat. Am. J. Probab. Math. Stat. 7 217-242. · Zbl 1276.60011 [26] Valen, E. (2009). Deciphering transcriptional regulation-Computational approaches. Ph.D. thesis, Bioinformatics Centre, Univ. Copenhagen.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.