Robustness of multiple testing procedures against dependence.

*(English)*Zbl 1155.62031Summary: An important aspect of multiple hypothesis testing is controlling the significance level, or the level of Type I error. When the test statistics are not independent it can be particularly challenging to deal with this problem, without resorting to very conservative procedures.

We show that, in the context of contemporary multiple testing problems, where the number of tests is often very large, the difficulties caused by dependence are less serious than in classical cases. This is particularly true when the null distributions of test statistics are relatively light-tailed, for example, when they can be based on normal or Student’s \(t\) approximations. There, if the test statistics can fairly be viewed as being generated by a linear process, an analysis founded on the incorrect assumption of independence is asymptotically correct as the number of hypotheses diverges.

In particular, the point process representing the null distribution of the indices at which statistically significant test results occur is approximately Poisson, just as in the case of independence. The Poisson process also has the same mean as in the independence case, and of course exhibits no clustering of false discoveries. However, this result can fail if the null distributions are particularly heavy-tailed. There clusters of statistically significant results can occur, even when the null hypothesis is correct. We give an intuitive explanation for these disparate properties in light- and heavy-tailed cases, and provide rigorous theory underpinning the intuition.

We show that, in the context of contemporary multiple testing problems, where the number of tests is often very large, the difficulties caused by dependence are less serious than in classical cases. This is particularly true when the null distributions of test statistics are relatively light-tailed, for example, when they can be based on normal or Student’s \(t\) approximations. There, if the test statistics can fairly be viewed as being generated by a linear process, an analysis founded on the incorrect assumption of independence is asymptotically correct as the number of hypotheses diverges.

In particular, the point process representing the null distribution of the indices at which statistically significant test results occur is approximately Poisson, just as in the case of independence. The Poisson process also has the same mean as in the independence case, and of course exhibits no clustering of false discoveries. However, this result can fail if the null distributions are particularly heavy-tailed. There clusters of statistically significant results can occur, even when the null hypothesis is correct. We give an intuitive explanation for these disparate properties in light- and heavy-tailed cases, and provide rigorous theory underpinning the intuition.

##### MSC:

62G10 | Nonparametric hypothesis testing |

62G35 | Nonparametric robustness |

62F03 | Parametric hypothesis testing |

62J15 | Paired and multiple comparisons; multiple testing |

##### Keywords:

false-discovery rate; family-wise error rate; linear process; moving average; multiplicity; significance level; simultaneous hypothesis testing; time-series
PDF
BibTeX
XML
Cite

\textit{S. Clarke} and \textit{P. Hall}, Ann. Stat. 37, No. 1, 332--358 (2009; Zbl 1155.62031)

**OpenURL**

##### References:

[1] | Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. JSTOR: · Zbl 0809.62014 |

[2] | Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery fate in multiple testing with independent statistics. J. Educ. Behav. Statist. 25 60-83. |

[3] | Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165-1188. · Zbl 1041.62061 |

[4] | Bernhard, G., Klein, M. and Hommel, G. (2004). Global and multiple test procedures using ordered p -values-a review. Statist. Papers 45 1-14. · Zbl 1085.62017 |

[5] | Blair, R. C., Troendle, J. F. and Beck, R.W. (1996). Control of familywise errors in multiple endpoint assessments via stepwise permutation tests. Statist. Med. 15 1107-1121. |

[6] | Brown, B. W. and Russell, K. (1997). Methods correcting for multiple testing: Operating characteristics. Statist. Med. 16 2511-2528. |

[7] | Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 73-103. · Zbl 1048.62099 |

[8] | Dunnett, C. W. and Tamhane, A. C. (1995). Step-up testing of parameters with unequally correlated estimates. Biometrics 51 217-227. · Zbl 0825.62376 |

[9] | Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93-103. · Zbl 1284.62340 |

[10] | Finner, H. and Roters, M. (1998). Asymptotic comparison of step-down and step-up multiple test procedures based on exchangeable test statistics. Ann. Statist. 26 505-524. · Zbl 0934.62073 |

[11] | Finner, H. and Roters, M. (1999). Asymptotic comparison of the critical values of step-down and step-up multiple comparison procedures. J. Statist. Plann. Inference 79 11-30. · Zbl 0951.62060 |

[12] | Finner, H. and Roters, M. (2000). On the critical value behavior of multiple decision procedures. Scand. J. Statist. 27 563-573. · Zbl 0976.62006 |

[13] | Finner, H. and Roters, M. (2002). Multiple hypotheses testing and expected number of type I errors. Ann. Statist. 30 220-238. · Zbl 1012.62020 |

[14] | Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035-1061. · Zbl 1092.62065 |

[15] | Godfrey, G. K. (1985). Comparing the means of several groups. New Eng. J. Med. 311 1450-1456. |

[16] | Gotzsche, P. C. (1989). Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis. Control Clin. Trials 10 31-56. |

[17] | Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75 800-802. JSTOR: · Zbl 0661.62067 |

[18] | Hochberg, Y. and Benjamini, Y. (1990). More powerful procedures for multiple testing. Statist. Med. 9 811-818. |

[19] | Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures . Wiley, New York. · Zbl 0731.62125 |

[20] | Holland, B. and Cheung, S. H. (2002). Familywise robustness criteria for multiple-comparison procedures. J. Roy. Statist. Soc. Ser. B 64 63-77. JSTOR: · Zbl 1015.62075 |

[21] | Hommell, G. (1988). A comparison of two modified Bonferroni procedures. Biometrika 76 624-625. JSTOR: · Zbl 0676.62028 |

[22] | Kesselman, H. J., Cribbie, R. and Holland, B. (2002). Controlling the rate of Type I error over a large set of statistical tests. Brit. J. Math. Statist. Psych. 55 27-39. |

[23] | Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses , 3rd ed. Springer, New York. · Zbl 1076.62018 |

[24] | Lehmann, E. L., Romano, J. P. and Shaffer, J. P. (2005). On optimality of stepdown and stepup multiple test procedures. Ann. Statist. 33 1084-1108. · Zbl 1072.62060 |

[25] | Ludbrook, J. (1991). On making multiple comparisons in clinical and experimental pharmacology and physiology. Clin. Exper. Pharm. Physiol. 18 379-392. |

[26] | Olejnik, S., Li, J. M., Supattathum, S. and Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. J. Educ. Behav. Statist. 22 389-406. |

[27] | Ottenbacher, K. J. (1991a). Statistical conclusion validity: An empirical analysis of multiplicity in mental retardation research. Amer. J. Ment. Retard. 95 421-427. |

[28] | Ottenbacher, K. J. (1991b). Statistical conclusion validity-multiple inferences in rehabilitation research. Amer. J. Phys. Med. Rehab. 70 317-322. |

[29] | Ottenbacher, K. J. (1998). Quantitative evaluation of multiplicity in epidemiology and public health research. Amer. J. Epidemiology 147 615-619. |

[30] | Ottenbacher, K. J. and Barrett, K. A. (1991). Measures of effect size in the reporting of rehabilitation research. Amer. J. Phys. Med. Rehab. 70 S131-S137. |

[31] | Pigeot, I. (2000). Basic concepts of multiple tests-A survey. Statist. Papers 41 3-36. · Zbl 0976.62002 |

[32] | Pocock, S. J., Hughes, M. D. and Lee, R. J. (1987). Statistical problems in reporting of clinical trials. J. Amer. Statist. Assoc. 84 381-392. |

[33] | Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77 663-665. JSTOR: |

[34] | Rosenberg, P. S., Che, A. and Chen, B. E. (2006). Multiple hypothesis testing strategies for genetic case-control association studies. Statist. Med. 25 3134-3149. |

[35] | Sarkar, S. K. (1998). Some probability inequalities for ordered MTP2 random variables: A proof of the Simes conjecture. Ann. Statist. 26 494-504. · Zbl 0929.62065 |

[36] | Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures. Ann. Statist. 34 394-415. · Zbl 1091.62060 |

[37] | Sarkar, S. K. and Chang, C. K. (1997). The Simes method for multiple hypothesis testing with positively dependent test statistics. J. Amer. Statist. Assoc. 92 1601-1608. JSTOR: · Zbl 0912.62079 |

[38] | Schmidt, R. and Stadtmüller, U. (2006). Nonparametric estimation of tail dependence. Scand. J. Statist. 33 307-335. · Zbl 1124.62016 |

[39] | Schmidt, T. (2007). Coping with copulas. In Copulas-From Theory to Applications in Finance (J. Rank, ed.) 3-34. Risk Books, London. |

[40] | Sen, P. K. (1999). Some remarks on Simes-type multiple tests of significance. J. Statist. Plann. Inference 82 139-145. · Zbl 1063.62560 |

[41] | Shao, Q.-M. (1999). A Cramér type large deviation result for Student’s t -statistic. J. Theoret. Probab. 12 385-398. · Zbl 0927.60045 |

[42] | Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751-754. JSTOR: · Zbl 0613.62067 |

[43] | Smith, D. E., Clemens, J., Crede, W., Harvey, M. and Gracely, E. J. (1987). Impact of multiple comparisons in randomized clinical trials. Amer. J. Med. 83 545-550. |

[44] | Yekutieli, D., Reiner-Benaim, A., Benjamini, Y., Elmer, G. I., Kafkafi, N., Letwin, N. E. and Lee, N. H. (2006). Approaches to multiplicity issues in complex research in microarray analysis. Statist. Neerlandica 60 414-437. · Zbl 1108.62123 |

[45] | Wang, Q. (2005). Limit theorems for self-normalized large deviations. Electron. J. Probab. 10 1260-1285. · Zbl 1112.60020 |

[46] | Wright, S. P. (1992). Adjusted p -values for simultaneous inference. Biometrics 48 1005-1013. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.