Multiple testing under dependence via graphical models. (English) Zbl 1391.62231

Summary: Large-scale multiple testing tasks often exhibit dependence. Leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to capture the dependence among multiple hypotheses. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The underlying true states of hypotheses are represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The model can be learned by a novel EM algorithm. The next step is to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. We also provide a semiparametric variation of the graphical model which is useful in the situation where \(f_{1}\) (the density function of the test statistic under the alternative hypothesis) is heterogeneous among multiple hypotheses. This semiparametric approach exactly generalizes the local FDR procedure [B. Efron et al., J. Am. Stat. Assoc. 96, No. 456, 1151–1160 (2001; Zbl 1073.62511)] and connects with the BH procedure [Y. Benjamini and Y. Hochberg, J. R. Stat. Soc., Ser. B 57, No. 1, 289–300 (1995; Zbl 0809.62014)]. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62J15 Paired and multiple comparisons; multiple testing
62M40 Random fields; image analysis
92D20 Protein sequences, DNA sequences
Full Text: DOI