Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data. (English) Zbl 1453.62128

Summary: Current gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of reducing location bias and data rescaling without taking into account the censoring that is characteristic of certain gene expressions, produced by experimental measurement constraints or by previous normalization steps. Moreover, control of normalization procedures for balancing bias versus variance is often left to the user’s experience. An approximate maximum likelihood procedure for fitting a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor are modeled by means of the B-spline smoothing technique. As an alternative to the outlier theory and robust methods, the approach presented looks for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumptions, controlling for different types of censoring. The Bayesian information criterion is adopted for model selection. Distributional assumptions are tested using goodness-of-fit statistics and Monte Carlo evaluation. Randomization quantiles are proposed to produce normally distributed adjusted data. Three publicly available data sets are analyzed for demonstration purposes. Student’s \(t\) error models reveal best performances in all of the data sets considered. More validating evidence is needed to evaluate the Asymmetric Laplace distribution, which showed interesting results in one data set.


62-08 Computational methods for problems pertaining to statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D20 Protein sequences, DNA sequences
Full Text: DOI


[1] Akaike, H., Information theory and an extension of the maximum likelihood principle, (), 267-281 · Zbl 0283.62006
[2] Bioconductor, 2006. R package version 1.11.3, affydata: Affymetrix data for demonstration purpose
[3] Bland, J.M.; Altman, D.G., Statistical method for assessing agreement between two methods of clinical measurement, The lancet, 307-310, (1986)
[4] Bland, J.M.; Altman, D.G., Measuring agreement in method comparison studies, Statistical methods in medical research, 8, 135-160, (1999)
[5] Bolstad, B.; Irizarry, R.; Astrand, M.; Speed, T., A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics, 19, 2, 185-193, (2003)
[6] Chen, Y.; Dougherty, E.R.; Bittner, M.L., Ratio-based decisions and the quantitative analysis of CDNA microarray images, Journal of biomedical optics, 2, 364-374, (1997)
[7] Cleveland, W.S.; Devlin, S.J., Locally weighted regression: an approach to regression analysis by local Fitting, Journal of the American statistical association, 83, 596-610, (1988) · Zbl 1248.62054
[8] Cui, X.; Kerr, M.K.; Churchill, G.A., Transformations for CDNA microarray data, Statistical applications in genetics and molecular biology, 2, 1, (2003), Article 4 · Zbl 1038.92015
[9] Dabney, A.R.; Storey, J.D., A new approach to intensity-dependent normalization of two-channel microarrays, () · Zbl 1170.62373
[10] D’Agostino; Stephens, Goodness-of-fit techniques, (1986), Marcel-Dekker New York, (Chapter 4) · Zbl 0597.62030
[11] deBoor, C., A practical guide to splines, (1978), Springer Berlin
[12] Dudoit, S.; Yang, Y.H.; Callow, M.J.; Speed, T.P., Statistical methods for identifying differentially expressed genes in replicated CDNA microarray experiments, Statistica sinica, 12, 111-139, (2002) · Zbl 1004.62088
[13] Dudoit, S.; Yang, Y.H., Bioconductor R packages for exploratory analysis and normalization of CDNA microarray data, (), 73-101
[14] Dunn, P.K.; Smyth, G.K., Randomized quantile residuals, Journal of computational and graphical statistics, 5, 236-244, (1996)
[15] Durbin, B.; Rocke, D.M., Estimation of transformation parameters for microarray data, Bioinformatics, 19, 11, 1360-1367, (2003)
[16] Durbin, B.P.; Rocke, D.M., Variance-stabilizing transformations for two-color microarrays, Bioinformatics, 20, 5, 660-667, (2004)
[17] Ein-Dor, L.; Kela, I.; Getz, G.; Givol, D.; Domany, E., Outcome signature genes in breast cancer: is there a unique set?, Bioinformatics, 21, 2, 171-178, (2005)
[18] Futschik, M.E.; Crompton, T., Model selection and efficiency testing for normalisation of CDNA microarray data, Genome biology, 5, R60, (2004)
[19] Futschik, M.E.; Crompton, T., Olin: optimized normalization, visualization and quality testing of two-channel microarray data, Bioinformatics application note, 21, 8, 1724-1726, (2005)
[20] Hastie, T.J., Generalized additive models, (1992), Wadsworth & Brooks/Cole, Ch. 7 of Statistical Models in S
[21] Huber, W.; Boer, J.M.; von Heydebreck, A.; Gunawan, B.; Vingron, M.; Fuzesi, L.; Poustka, A.; Sueltmann, H., Transcription profiling of renal cell carcinoma, Verhandlungen der deutschen gesellschaft für pathologie, 86, 153-164, (2002)
[22] Huber, W.; von Heydebreck, A.; Sueltmann, H.; Poustka, A.; Vingron, M., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics, 18 Suppl. 1, S96-S104, (2002)
[23] Huber, W.; von Heydebreck, A.; Sultmann, H.; Poustka, A.; Vingron, M., Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics, 18 Suppl 1, S96-104, (2002)
[24] Huber, W.; von Heydebreck, A.; Sultmann, H.; Poustka, A.; Vingron, M., Parameter estimation for the calibration and variance stabilization of microarray data, Statistical applications in genetics and molecular biology, 2, 1, (2003)
[25] Huber, W., von Heydebreck, A., Vingron, M., 2004. Error Models for Microarray Intensities, Tech. Rep. 6, Bioconductor Project Working Papers. URL citeseer.ist.psu.edu/article/newton99differential.html
[26] Ideker, T.; Thorsson, V.; Siegel, A.F.; Hood, L.E., Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data, Journal of computational biology, 7, 6, 805-817, (2000)
[27] Ioannidis, J.P.A., Microarrays and molecular research: noise discovery?, Lancet, 365, 9458, 454-455, (2005)
[28] Kerr, M.K.; Churchill, G.A., Experimental design for gene expression microarrays, Biostatistics, 2, 2, 183-201, (2001) · Zbl 1097.62562
[29] Kerr, M.K.; Leiter, E.H.; Picard, L.; Churchill, G.A., Sources of variation in microarray experiments, (), (Chapter 3)
[30] Khondoker, M.R.; Glasbey, C.A.; Worton, B.J., Statistical estimation of gene expression using multiple laser scans of microarrays, Bioinformatics, 22, 2, 215-219, (2006)
[31] Kotz, S.; Kozubowski, T.J.; Podgorski, K., The Laplace distribution and generalizations, (1998), Birkhauser Berlin
[32] Lama, N.; Ambrogi, F.; Antolini, L.; Boracchi, P.; Biganzoli, E., Some issues and perspectives in microarray data analysis in breast cancer: the need for an integrated research, ()
[33] Lee, M.-L.T., Analysis of microarray gene expression data, ISBN: 1402077882, (2004), Boston Kluwer Academic Publishers, eBook, ISBN: 0792370872
[34] Lee, M.-L.T., Analysis of microarray gene expression data, (2004), Boston Kluwer Academic Publishers, 7.1 Missing values in array data, pp. 85-86
[35] Lemarechal, C., Bundle methods in nonsmooth optimization, () · Zbl 0398.90088
[36] Lindsey, J.K., A review of some extensions to generalized linear models, Statistics in medicine, 18, 2223-2236, (1999)
[37] Luksan, L.; Vlcek, J., Algorithm 811: nda: algorithms for nondifferentiable optimization, ACM transactions on mathematical software, 27, 2, 193-213, (2001) · Zbl 1070.65552
[38] Mineo, A.M.; Ruggieri, M., A software tool for the exponential power distribution: the package, J-j-stat-soft, 12, 4, 1-24, (2005)
[39] Pearson, E.S.; Stephens, M.A., The goodness-of-fit tests based on \(w_n^2\) and \(u_n^2\), Biometrika, 49, 3/4, 397-402, (1962), URL http://www.jstor.org/stable/2333974 · Zbl 0221.62015
[40] Pochet, N.; DeSmet, F.; Suykens, J.A.K.; DeMoor, B.L.R., Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction, Bioinformatics, 20, 17, 3185-3195, (2004)
[41] Puig, P.; Stephens, M.A., Tests of fit for the Laplace distribution, with applications, Technometrics, 42, 4, 417-424, (2000), URL http://www.jstor.org/stable/1270952 · Zbl 0996.62050
[42] Purdom, E.; Holmes, S.P., Error distribution for gene expression data, Statistical applications in genetics and molecular biology, 4, 1, (2005), Article 16 · Zbl 1083.62114
[43] Qiu, X.; Brooks, A.I.; Klebanov, L.; Yakovlev, A., The effects of normalization on the correlation structure of microarray data, BMC bioinformatics, 6, 20, (2005)
[44] Quackenbush, J., Microarray data normalization and transformation, Nat genet, 32 Suppl, 496-501, (2002)
[45] R Development Core Team, 2004. R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-00-3, URL http://www.R-project.org
[46] Rider, P.R., A generalized law of error, Journal of the American statistical association, 19, 146, 217-220, (1924)
[47] Rocke, D.M.; Durbin, B., A model for measurement error for gene expression arrays, Journal of computational biology, 8, 6, 557-569, (2001)
[48] Russo, G.; Zegar, C.; Giordano, A., Advantages and limitations of microarray technology in human cancer, Oncogene, 22, 42, 6497-6507, (2003)
[49] Schwarz, G., Estimating the dimension of a model, Annals of statistics, 6, 461-464, (1978) · Zbl 0379.62005
[50] Sharov, V.; Kwong, K.Y.; Frank, B.; Chen, E.; Hasseman, J.; Gaspard, R.; Yu, Y.; Yang, I.; Quackenbush, J., The limits of log-ratios, BMC biotechnology, 4, 3, (2004)
[51] Classification in microarray experiments, (), 93-158, (Chapter 3)
[52] Stephens, M.A., Edf statistics for goodness of fit and some comparisons, Journal of the American statistical association, 69, 730-737, (1974)
[53] Stephens, M.A., Asymptotic results for goodness-of-fit statistics with unknown parameters, Annals of statistics, 4, 357-369, (1976) · Zbl 0325.62014
[54] Strimmer, K., Modeling gene expression measurement error: A quasi-likelihood approach, BMC bioinformatics, 4, 10, (2003)
[55] Tarca, A.L.; Cooke, J.E.K.; Mackay, J., A robust neural networks approach for spatial and intensity-dependent normalization of CDNA microarray data, Bioinformatics, 21, 11, 2674-2683, (2005)
[56] Tibshirani, R.J.; Efron, B., Pre-validation and inference in microarrays, Statistical applications in genetics and molecular biology, 1, 1, 1-18, (2002), URL http://www.bepress.com/sagmb/vol1/iss1/art1 · Zbl 1037.62116
[57] van ’t Veer, L.J.; Dai, H.; van de Vijver, M.J.; He, Y.D.; Hart, A.A.; Mao, M.; Peterse, H.L.; van der Kooy, K.; Marton, M.J.; Witteveen, A.T.; Schreiber, G.J.; Kerkhoven, R.M.; Roberts, C.; Linsley, P.S.; Bernards, R.; Friend, S.H., Gene expression profiling predicts clinical outcome of breast cancer, Nature, 415, 530-536, (2002)
[58] Vianelli, S., La misura Della variabilita condizionata in uno schema generale delle curve normali di frequenza, Statistica, 23, 447-474, (1963)
[59] Wit, E.; McClure, J., Statistical adjustment of signal censoring in gene expression experiments, Bioinformatics, 19, 9, 1055-1060, (2003)
[60] Wit, E.; McClure, J., Statistics for microarrays, (2003), John Wiley & Sons
[61] Workman, C.; Jensen, L.J.; Jarmer, H.; Gautier, R.L.; Nielser, H.B.; Nielsen, H.-H.C.; Brunak, S.; Knudsen, S., A new non-linear normalization method for reducing variability in DNA microarray experiments, Genome biology, 3, 9, (2002), research0048
[62] Yang, I.V.; Chen, E.; Hasseman, J.P.; Liang, W.; Frank, B.C.; Wang, S.; Sharov, V.; Saeed, A.I.; White, J.; Li, J.; Lee, N.H.; Yeatman, T.J.; Quackenbush, J., Within the fold: assessing differential expression measures and reproducibility in microarray assays, Genome biology, 3, 11, (2002), research0062
[63] Yang, Y.H.; Dudoit, S.; Luu, P.; Lin, D.M.; Peng, V.; Ngai, J.; Speed, T.P., Normalization for CDNA microarray data: A robust composite method addressing single and multiple slide systematic variation, Nucleic acids research, 30, 4, (2002), e15
[64] Yang, Y.H.; Speed, T., Design issues for CDNA microarray experiments, Nature review genetics, 3, 8, 579-588, (2002)
[65] Yeung, K.Y.; Bäumgartner, R.E., Multiclass classification of microarray data with repeated measurements: application to cancer, Genome biology, 4, 12, R83, (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.