×

Statistical calibration of qRT-PCR, microarray and RNA-Seq gene expression data with measurement error models. (English) Zbl 1454.62406

Summary: The accurate quantification of gene expression levels is crucial for transcriptome study. Microarray platforms are commonly used for simultaneously interrogating thousands of genes in the past decade, and recently RNA-Seq has emerged as a promising alternative. The gene expression measurements obtained by microarray and RNA-Seq are, however, subject to various measurement errors. A third platform called qRT-PCR is acknowledged to provide more accurate quantification of gene expression levels than microarray and RNA-Seq, but it has limited throughput capacity. In this article, we propose to use a system of functional measurement error models to model gene expression measurements and calibrate the microarray and RNA-Seq platforms with qRT-PCR. Based on the system, a two-step approach was developed to estimate the biases and error variance components of the three platforms and calculate calibrated estimates of gene expression levels. The estimated biases and variance components shed light on the relative strengths and weaknesses of the three platforms and the calibrated estimates provide a more accurate and consistent quantification of gene expression levels. Theoretical and simulation studies were conducted to establish the properties of those estimates. The system was applied to analyze two gene expression data sets from the Microarray Quality Control (MAQC) and Sequencing Quality Control (SEQC) projects.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

GPSeq; DEseq
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Affymetrix Inc. (2005). Technical note: Guide to Probe Logarithmic Intensity Error (PLIER) Estimation Affymetrix White Paper.
[2] Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biol. 11 R106.
[3] Applied Biosystems (2006). TaqMan \textregistered Gene Expression Assays for Validating Hits From Fluorescent Microarrays White Paper.
[4] Barnett, V. D. (1969). Simultaneous pairwise linear structural relationships. Biometrics 25 129-142.
[5] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[6] Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11 94.
[7] Bustin, S. A. (2002). Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): Trends and problems. J. Mol. Endocrinol. 29 23-39.
[8] Bustin, S. A. and Nolan, T. (2004). Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. J. Biomol. Tech. 15 155-166.
[9] Carter, R. L. and Fuller, W. A. (1980). Instrumental variable estimation of the simple errors-in-variables model. J. Amer. Statist. Assoc. 75 687-692. · Zbl 0459.62053 · doi:10.2307/2287670
[10] Cheng, C.-L. and Van Ness, J. W. (1999). Statistical Regression with Measurement Error. Kendall’s Library of Statistics 6 . Arnold, London. · Zbl 0947.62046
[11] Fuller, W. A. (1987). Measurement Error Models . Wiley, New York. · Zbl 0800.62413
[12] Gleser, L. J. (1983). Functional, structural and ultrastructural errors-in-variables models. In Proceedings of the Business and Economic Statistics Section 57-66. Amer. Statist. Assoc., Alexandria, VA.
[13] Griebel, T., Zacher, B., Ribeca, P., Raineri, E., Lacroix, V., Guigó, R. and Sammeth, M. (2012). Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40 10073-10083.
[14] Hu, M., Zhu, Y., Taylor, J. M. G., Liu, J. S. and Qin, Z. S. (2012). Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq. Bioinformatics 28 63-68.
[15] Kendall, M. G. and Stuart, A. (1973). The Advanced Theory of Statistics : Inference and Relationship , Vol. 2, 4th ed. Griffin, London. · Zbl 0249.62003
[16] Li, J., Jiang, H. and Wong, W. (2010). Modeling nonuniformity in short-read rates in RNA-Seq data. Genome Biol. 11 R50.
[17] Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H. and Brown, E. L. (1996). Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14 1675-1680.
[18] Mak, H. C. (2011). John Storey provides his take on the importance of new statistical methods for high-throughput sequencing. Nat. Biotechnol. 29 331-333.
[19] Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5 621-628.
[20] Osborne, C. (1991). Statistical calibration: A review. International Statistical Review/Revue Internationale de Statistique 59 309-336. · Zbl 0743.62066 · doi:10.2307/1403690
[21] Pfaffl, M. W. (2004). Quantification strategies in real-time PCR. In A-Z of Quantitative PCR (S. A. Bustin, ed.). International University Line (IUL), La Jolla, CA.
[22] Reiersøl, O. (1950). Identifiability of a linear relation between variables which are subject to error. Econometrica 18 375-389. · Zbl 0040.22502 · doi:10.2307/1907835
[23] Robinson, M. D. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11 R25.
[24] Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 467-470.
[25] Schwartz, S., Oren, R. and Ast, G. (2011). Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS ONE 6 e16685.
[26] Solari, M. E. (1969). The “Maximum Likelihood Solution” of the problem of estimating a linear functional relationship. J. R. Stat. Soc. Ser. B Stat. Methodol. 31 372-375.
[27] Srivastava, S. and Chen, L. (2010). A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 38 e170.
[28] Sun, Z., Kuczek, T. and Zhu, Y. (2014). Supplement to “Statistical calibration of qRT-PCR, microarray and RNA-Seq gene expression data with measurement error models.” . · Zbl 1454.62406
[29] Wang, Z., Gerstein, M. and Snyder, M. (2009). RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 10 57-63.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.