Replicated microarray data.

*(English)*Zbl 1004.62086Summary: cDNA microarrays permit us to study the expression of thousands of genes simultaneously. They are now used in many different contexts to compare mRNA levels between two or more samples of cells. Microarray experiments typically give us expression measurements on a large number of genes, say \(10,000-20,000\), but with few, if any, replicates for each gene. Traditional methods using means and standard deviations to detect differential expressions are not completely satisfactory in this context, and so a different approach seems desirable.

We present an empirical Bayes method for analysing replicated microarray data. Data from all the genes in a replicate set of experiments are combined into estimates of parameters of a prior distribution. These parameter estimates are then combined at the gene level with means and standard deviations to form a statistic \(B\) which can be used to decide whether differential expression has occurred. The statistic \(B\) avoids the problems of using averages or \(t\)-statistics. The method is illustrated using data from an experiment comparing the expression of genes in the livers of SR-BI transgenic mice with that of the corresponding wild-type mice.

In addition we present the results of a simulation study estimating the ROC curve of \(B\) and three other statistics for determining differential expression: the average and two simple modifications of the usual \(t\)-statistic. \(B\) was found to be the most powerful of the four, though the margin was not great. The data were simulated to resemble the SR-BI data.

We present an empirical Bayes method for analysing replicated microarray data. Data from all the genes in a replicate set of experiments are combined into estimates of parameters of a prior distribution. These parameter estimates are then combined at the gene level with means and standard deviations to form a statistic \(B\) which can be used to decide whether differential expression has occurred. The statistic \(B\) avoids the problems of using averages or \(t\)-statistics. The method is illustrated using data from an experiment comparing the expression of genes in the livers of SR-BI transgenic mice with that of the corresponding wild-type mice.

In addition we present the results of a simulation study estimating the ROC curve of \(B\) and three other statistics for determining differential expression: the average and two simple modifications of the usual \(t\)-statistic. \(B\) was found to be the most powerful of the four, though the margin was not great. The data were simulated to resemble the SR-BI data.

##### MSC:

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

92D10 | Genetics and epigenetics |

62C12 | Empirical decision procedures; empirical Bayes procedures |

92C40 | Biochemistry, molecular biology |