×

Sequential prediction bounds for identifying differentially expressed genes in replicated microarray experiments. (English) Zbl 1058.62099

Summary: Microarrays are new biotechnological devices that permit the simultaneous evaluation of expression levels of thousands of genes in one or more tissue samples. We develop a new method for identifying differentially expressed genes in replicated cDNA and oligonucleotide microarray experiments. The method is based on a nonparametric prediction interval which is computed as an order statistic of \(n\) control measurements and is applied sequentially to a series of \(p\) replicate sets of experimental measurements, each of size \(n_i\). We illustrate how reasonable experiment-wise false positive and false negative rates can be attained for any practical number of genes based on manipulating the order statistics, \(n\), \(p\) and \(n_i\). The method is used to identify gene expression levels that are associated with a pathological condition beyond chance expectations given the large number of genes tested. We illustrate use of the method on replicated gene expression data in tumor and normal colon tissues, and compare it to an alternative approach based on permutation tests.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
62L10 Sequential statistical analysis
62G30 Order statistics; empirical distribution functions
62G10 Nonparametric hypothesis testing

Software:

sma; S-PLUS; R
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alon, U.; Barkai, N.; Notterman, D.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Nat. Acad. Sci., 96, 6745-6750 (1999)
[2] Audic, S.; Claverie, J., Visualizing the competitive recognition of TATA-boxes in vertebrate promoters, Trends Gen., 14, 10-11 (1998)
[3] Ben-Dor, A.; Bruhn, L.; Friedman, N.; Nachman, I.; Schummer, M.; Yakhini, Z., Tissue classification with gene expression profiles, J. Comput. Biol., 7, 559-583 (2000)
[4] Benjamini, Y.; Hochberg, Y., Controlling the false discovery rateA practical and powerful approach to multiple testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300 (1995) · Zbl 0809.62014
[5] Brown, M. P.S.; Grundy, W. N.; Lin, D.; Sugnet, C.; Ares, M.; Haussler, D., Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Nat. Acad. Sci., 97, 262-267 (1999)
[6] Campbell, I.; Freemont, P.; Foulkes, W.; Trowsdale, J., An ovarian tumor marker with homology to vaccinia virus contains an IgV-like region and multiple transmembrane domains, Cancer Res., 52, 5416-5420 (1992)
[7] Chee, M.; Yang, R.; Hubbell, E.; Berno, A.; Huang, X. C.; Stern, D.; Winkler, J.; Lockhart, D. J.; Morris, M. S.; Fodor, S. P., Accessing genetic information with high density DNA microarrays, Science, 274, 610-614 (1996)
[8] Chen, Y.; Dougherty, E. R.; Bittner, M. L., Ratio-based decisions and the quantitative analysis of cDNA microarray images, J. Biomed. Opt., 2, 364-374 (1997)
[9] Chou, Y.; Owen, D., One-sided distribution-free simultaneous prediction limits for \(p\) future samples, J. Quality Technol., 18, 96-98 (1986)
[10] Claverie, J., Computational methods for the identification of differential and coordinated gene expression, Human Mol. Gen., 8, 1821-1832 (1999)
[11] DeRisi, J. L.; Iyer, V. R.; Brown, P. O., Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686 (1997)
[12] Dudoit, S.; Yang, Y. H.; Callow, M. J.; Speed, T. P., Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments, Statistical Sinica, 12, 111-139 (2002) · Zbl 1004.62088
[13] Efron, B.; Tibshirani, R.; Storey, J. D.; Tusher, V., Empirical Bayes analysis of a microarray experiment, J. Amer. Statist. Assoc., 96, 1151-1160 (2001) · Zbl 1073.62511
[14] Eisen, M.; Spellman, P.; Brown, P.; Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proc. National Acad. Sci., 95, 14863-14868 (1998)
[15] Gibbons, R., A general statistical procedure for Ground-Water Detection Monitoring at waste disposal facilities, Ground Water, 28, 235-243 (1990)
[16] Gibbons, R., Some additional nonparametric prediction limits for ground-water monitoring at waste disposal facilities, Ground Water, 29, 729-736 (1991)
[17] Gibbons, R., Statistical Methods for Groundwater Monitoring (1994), Wiley: Wiley New York · Zbl 1181.86001
[18] Gossett, R. E.; Schroeder, F.; Gunn, J. M.; Kier, A. B., Expression of fatty acyl-CoA binding proteins in colon cellsresponse to butyrate and transformation, Lipids, 32, 577-585 (1997)
[19] Guttman, I., Statistical Tolerance RegionsClassical and Bayesian (1970), Hafner: Hafner Darien Connecticut
[20] Hahn, G.; Meeker, W., Statistical IntervalsA Guide for Practitioners (1991), Wiley: Wiley New York
[21] Han, A. C.; Edelson, M. I.; Soler, A. P.; Knudsen, K. A.; Lifschitz-Mercer, B.; Czernobilsky, B.; Rosenblum, N. G.; Salazar, H., Cadherin expression in glandular tumors of the cervix, Cancer, 89, 2053-2058 (2000)
[22] Hastie, T., Tibshirani, R., Eisen, M., Brown, P., Ross, D., Scherf, U., Weinstein, J., Alizadeh, A., Staudt, L., Botstein, D., 2000. Gene shaving: a new class of clustering methods for expression arrays. Technical Report, Stanford University.; Hastie, T., Tibshirani, R., Eisen, M., Brown, P., Ross, D., Scherf, U., Weinstein, J., Alizadeh, A., Staudt, L., Botstein, D., 2000. Gene shaving: a new class of clustering methods for expression arrays. Technical Report, Stanford University.
[23] Ibrahim, J. G.; Chen, M. H.; Gray, R., Bayesian models for gene expression with DNA microarray data, J. Amer. Statist. Assoc., 97, 88-99 (2002) · Zbl 1073.62578
[24] Kerr, M. K.; Churchill, G. A., Experimental design for gene expression microarrays, Biostatistics, 2, 183-201 (2001) · Zbl 1097.62562
[25] Kerr, M.K., Martin, M., Churchill, G.A., 2000. Analysis of variance for gene expression microarray data. J. of Comput. Biol., in press.; Kerr, M.K., Martin, M., Churchill, G.A., 2000. Analysis of variance for gene expression microarray data. J. of Comput. Biol., in press.
[26] Lee, M.; Kuo, F.; Whitmore, G.; Sklar, J., Importance of replication in microarray gene expression studiesStatistical methods and evidence from repetitive cDNA hybridizations, Proc. Nat. Acad. Sci., 97, 9834-9839 (2000) · Zbl 0955.92016
[27] Mack, D.; Tom, E.; Mahadev, M.; Dong, H.; Mittman, M.; Dee, S.; Levine, A.; Gingeras, T.; Lockhart, D., Deciphering molecular circuitry using high-density DNA arrays, (Mihich, K.; Croce, C., Biology of Tumors (1998), Plenum: Plenum New York), 123-131
[28] Manly, B. F.J., Randomization Bootstrap and Monte Carlo Methods in Biology (1997), Chapman and Hall: Chapman and Hall London · Zbl 0918.62081
[29] Martinez, J.; Prevot, S.; Nordlinger, B.; Nguyen, T.; Lacarriere, Y.; Munier, A.; Lascu, I.; Vaillant, J.; Capeau, J.; Lacombe, M., Overexpression of nm23-H1 and nm23-H2 genes in colorectal carcinomas and loss of nm23-H1 expression in advanced tumor stages, Gut, 37, 712-720 (1995)
[30] Newton, M. A.; Kendziorski, C. M.; Richmond, C. S.; Blattner, F. R.; Tsui, K. W., On differential variability of expression ratiosimproving statistical inference about gene expression changes from microarray data, J. Comput. Biol., 8, 37-52 (2001)
[31] Perou, C.; Jeffrey, S.; vandeRijn, M.; Rees, C.; Eisen, M.; Ross, D.; Pergamenschikov, A.; Williams, C.; Zhu, S.; Lee, J.; Lashkari, D.; Shalon, D.; Brown, P.; Botstein, D., Distinctive gene expression patterns in human mammary epithelial cells and breast cancers, Science, 96, 9212-9217 (1999)
[32] Repp, A.; Mayhew, E.; Apte, S.; Niederkorn, J., Human uveal melanoma cells produce macrophage migration-inhibitory factor to prevent lysis by NK cells, J. Immunol., 165, 710-715 (2000)
[33] Sapir, M., Churchill, G.A., 2000. Estimating the posterior probability of gene expression from microarray data. Unpublished manuscript, The Jackson Laboratory. (http://www.jax.org/research/churchill).; Sapir, M., Churchill, G.A., 2000. Estimating the posterior probability of gene expression from microarray data. Unpublished manuscript, The Jackson Laboratory. (http://www.jax.org/research/churchill).
[34] Sarhan, A.; Greenberg, B., Contributions to Order Statistics (1962), Wiley: Wiley New York · Zbl 0102.35204
[35] Schena, M.; Shalon, D.; Davis, R. W.; Brown, P. O., Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science, 270, 467-470 (1995)
[36] Schena, M.; Shalon, D.; Heller, R.; Chai, A.; Brown, P. O.; Davis, R. W., Parallel human genome analysismicroarray-based expression monitoring of 1000 genes, Proc. Nat. Acad. Sci., 93, 10614-10619 (1996)
[37] Slonim, D.; Tamayo, P.; Mesirov, J.; Golub, T. R.; Lander, E., Class prediction and discovery using gene expression data, (Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, April 8-11, 2000, Tokyo, Japan (2000))
[38] Tusher, V. G.; Tibshirani, R.; Chu, G., Significance analysis of microarrays applied to the ionizing radiation response, Proc. Nat. Acad. Sci., 98, 5116-5121 (2001) · Zbl 1012.92014
[39] van Someren, E.; Wessels, L. F.A.; Reinders, M. J.T., Linear modeling of genetic networks from experimental data, (Proceedings of the Intelligent Systems in Molecular Biology Conference, August 19-23, 2000, La Jolla, California (2000)) · Zbl 1144.90487
[40] Venables, W. N.; Ripley, B. D., Modern Applied Statistics with S-Plus (1999), Springer: Springer New York · Zbl 0927.62002
[41] Westfall, P. H.; Young, S. S., Resampling-based Multiple TestingExamples and Methods for \(p\)-Value Adjustment (1993), Wiley: Wiley New York
[42] Yang, Y. H.; Speed, T., Design issues for cDNA microarray experiments, Nature Rev., 3, 579-588 (2002)
[43] Zien, A., Fluck, J., Lengauer, T., 2002. Microarrays: how many do you need ? Assoc. Comput. Mach.; Zien, A., Fluck, J., Lengauer, T., 2002. Microarrays: how many do you need ? Assoc. Comput. Mach.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.