A model-based background adjustment for oligonucleotide expression arrays. (English) Zbl 1055.62129

Summary: High-density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further preprocessing and data reduction occurs after the image-processing step. Statistical procedures developed by academic groups have been successful in improving the default algorithms provided by the Affymetrix system.
We present a solution to one of the preprocessing steps – background adjustment – based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications. These arrays use short oligonucleotides to probe for genes in an RNA sample. Typically, each gene is represented by 11–20 pairs of oligonucleotide probes. The first component of these pairs is referred to as a perfect match probe and is designed to hybridize only with transcripts from the intended gene (i.e., specific hybridization). However, hybridization by other sequences (i.e., nonspecific hybridization) is unavoidable. Furthermore, hybridization strengths are measured by a scanner that introduces optical noise. Therefore, the observed intensities need to be adjusted to give accurate measurements of specific hybridization.
We have found that the default ad hoc adjustment, provided as part of the Affymetrix system, can be improved through the use of estimators derived from a statistical model that uses probe sequence information. A final step in preprocessing is to summarize the probe-level data for each gene to define a measure of expression that represents the amount of the corresponding mRNA species. We illustrate the practical consequences of not adjusting appropriately for the presence of nonspecific hybridization and provide a solution based on our background adjustment procedure. Software that computes our adjustment is available as part of the Bioconductor Project (http://www.bioconductor.org).


62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology


gcrma; Bioconductor; vsn
Full Text: DOI Link