A semiparametric Bayesian model for repeatedly repeated binary outcomes.

*(English)*Zbl 1409.62227Summary: We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives.

##### MSC:

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

##### Keywords:

Dirichlet process; loss of heterozygosity; partial exchangeability; semiparametric random effects##### Software:

dChipSNP
PDF
BibTeX
XML
Cite

\textit{F. A. Quintana} et al., J. R. Stat. Soc., Ser. C, Appl. Stat. 57, No. 4, 419--431 (2008; Zbl 1409.62227)

Full Text:
DOI

**OpenURL**

##### References:

[1] | Basu, Bayesian analysis of binary regression using symmetric and asymmetric links, Sankhya B 62 pp 372– (2000) · Zbl 0978.62017 |

[2] | Benjamini, Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Statist. Soc. B 57 pp 289– (1995) · Zbl 0809.62014 |

[3] | Beroukhim, Inferring loss-of-heterozygosity from unpaired tumors using high-density oligonucleotide snp arrays, PLOS Computnl Biol. 2 (2006) |

[4] | Carlin, Bayes and Empirical Bayes Methods for Data Analysis (1996) · Zbl 0871.62012 |

[5] | Ferguson, A Bayesian analysis of some nonparametric problems, Ann. Statist. 1 pp 209– (1973) · Zbl 0255.62037 |

[6] | Goldstein, Multilevel Statistical Models (2003) · Zbl 1014.62126 |

[7] | Hartford (2006) |

[8] | Heagerty, Marginalized multilevel models and likelihood inference, Statist. Sci. 15 pp 1– (2000) |

[9] | Kleinman, A semi-parametric bayesian approach to the random effects model, Biometrics 54 pp 921– (1998) |

[10] | Lin, dchipsnp: significance curve and clustering of snp-array-based loss-of-heterozygosity data, Bioinformatics 20 pp 1233– (2004) |

[11] | MacEachern, Efficient MCMC schemes for robust model extensions using encompassing dirichlet process mixture models, Robust Bayesian Analysis (2000) · Zbl 1281.62070 |

[12] | Miller, Pooled analysis of loss of heterozygosity in breast cancer: a genome scan provides comparative evidence for multiple tumor suppressors and identifies novel candidate regions, Am. J. Hum. Genet. 73 pp 748– (2003) |

[13] | Mukhopadhyay, Dirichlet process mixed generalized linear models, J. Am. Statist. Ass. 92 pp 633– (1997) · Zbl 0889.62062 |

[14] | Mller, Optimal sample size for multiple testing: the case of gene expression microarrays, J. Am. Statist. Ass. 99 pp 990– (2004) |

[15] | Mller, Nonparametric Bayesian data analysis, Statist. Sci. 19 pp 95– (2004) |

[16] | Mller, A bayesian population model with hierarchical mixture priors applied to blood count data, J. Am. Statist. Ass. 92 pp 1279– (1997) |

[17] | Mller, Semiparametric Bayesian inference for multilevel repeated measurement data, Biometrics 63 pp 280– (2007) |

[18] | Neal, Markov chain sampling methods for dirichlet process mixture models, J. Computnl Graph. Statist. 9 pp 249– (2000) |

[19] | Newton, On the statistical analysis of allelic-loss data, Statist. Med. 17 pp 1425– (1998) |

[20] | Newton, Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data, Biometrics 56 pp 1088– (2000) · Zbl 1060.62646 |

[21] | Pedersen-Bjergaard, Insights into leukemogenesis from therapy-related leukemia, New Engl. J. Med. 352 pp 1591– (2005) |

[22] | Quintana, Nonparametric bayesian assessment of the order of dependence for binary sequences, J. Computnl Graph. Statist. 13 pp 213– (2004) |

[23] | Quintana, Assessing the order of dependence for partially exchangeable binary data, J. Am. Statist. Ass. 93 pp 194– (1998) · Zbl 0910.62048 |

[24] | Quintana, Computational aspects of Nonparametric Bayesian analysis with applications to the modeling of multiple binary sequences, J. Computnl Graph. Statist. 9 pp 711– (2000) |

[25] | Relling, Granulocyte colony-stimulating factor and the risk of secondary myeloid malignancy after etoposide treatment, Blood 101 pp 3862– (2003) |

[26] | Ross, Introduction to Probability Models (2002) |

[27] | Storey, A direct approach to false discovery rates, J. R. Statist. Soc. B 64 pp 479– (2002) · Zbl 1090.62073 |

[28] | Walker, Bayesian nonparametric inference for random distributions and related functions (with discussion), J. R. Statist. Soc. B 61 pp 485– (1999) · Zbl 0983.62027 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.