## Statistical significance for genomewide studies.(English)Zbl 1130.62385

Summary: With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance, called the $$q$$ value, is associated with each tested feature. The $$q$$ value is similar to the well known $$p$$ value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

### MSC:

 62P10 Applications of statistics to biology and medical sciences; meta analysis 92C40 Biochemistry, molecular biology 62F03 Parametric hypothesis testing 92D10 Genetics and epigenetics
Full Text:

### References:

 [1] Morton, American journal of human genetics 7 (3) pp 277– (1955) [2] Lander, Nature genetics 11 (3) pp 241– (1995) [3] J R STAT SOC B 64 pp 479– (2002) · Zbl 1090.62073 [4] J R STAT SOC B 85 pp 289– (1995) [5] J STAT PLAN INF 82 pp 171– (1999) · Zbl 1063.62563 [6] J ED BEHAV STAT 25 pp 60– (2000) [7] 96 pp 1151– (2001) · Zbl 1073.62511 [8] J R STAT SOC B 64 pp 499– (2002) · Zbl 1090.62072 [9] 98 pp 236– (2003) · Zbl 1047.62114 [10] PNAS 98 (9) pp 5116– (2001) · Zbl 1012.92014 [11] Tatusov, PNAS 91 (25) pp 12091– (1994) [12] Hedenfalk, New England Journal of Medicine 344 (8) pp 539– (2001) [13] Nature genetics 32 pp 502– (2002) [14] Blencowe, Trends in biochemical sciences 25 (3) pp 106– (2000) [15] Science 297 (5583) pp 1007– (2002) [16] Science 296 (5568) pp 752– (2002) [17] Churchill, Genetics 138 (3) pp 963– (1994) [18] Lee, Science 298 (5594) pp 799– (2002) [19] Kolodner, Genes & Development 10 (12) pp 1433– (1996) [20] Liu, Biochemical and biophysical research communications 254 (1) pp 203– (1999) [21] Hishikawa, Journal of Biological Chemistry 274 (52) pp 37461– (1999) [22] Efron, Genetic epidemiology 23 (1) pp 70– (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.