zbMATH — the first resource for mathematics

The theoretic framework of local weighted approximation for microarray missing value estimation. (English) Zbl 1214.68323
Summary: Microarray data are used in many biomedical experiments. They often contain missing values which significantly affect statistical algorithms. Although a number of imputation algorithms have been proposed, they have various limitations to exploit local and global information effectively for estimation. It is necessary to develop more effective techniques to solve the data imputation problem. In this paper, we propose a theoretic framework of local weighted approximation for missing value estimation, based on the Taylor series approximation. Besides revealing that \(k\)-nearest neighbor imputation (KNNimpute) is a special case of the framework, we focus on the study of its linear case-local weighted linear approximation imputation (LWLAimpute) from theory to experiment. Experimental results show that LWLAimpute and its iterative version can achieve better performance than some existing imputation methods, the superiority becomes more significant with increasing level of missing values.

68T10 Pattern recognition, speech recognition
92C50 Medical applications (general)
Full Text: DOI
[1] Lockhart, D.J.; Winzeler, E.A., Genomics, gene expression and DNA arrays, Nature, 405, 827-836, (2000)
[2] Schulze, A.; Downward, J., Navigating gene expression using microarrays—a technology review, Nature cell biology, 3, E190-E195, (2001)
[3] Perou, C.M.; Sørlie, T.; Eisen, M.B., Molecular portraits of human breast tumours, Nature, 406, 747-752, (2000)
[4] Chu, S.; DeRisi, J.; Eisen, M., The transcriptional program of sporulation in budding yeast, Science, 282, 699-705, (1998)
[5] Liewa, A.W.C.; Yan, H.; Yang, M.S., Pattern recognition techniques for the emerging field of bioinformatics: a review, Pattern recognition, 38, 11, 2055-2073, (2005)
[6] Lin, T.C.; Liu, R.S.; Chen, C.Y.; Chao, Y.T.; Chen, S.Y., Pattern classification in DNA microarray data of multiple tumor types, Pattern recognition, 39, 12, 2426-2438, (2006) · Zbl 1103.68771
[7] Pugazhenthi, D.; Rajagopalan, S.P., Machine learning technique approaches in drug discovery, design and development, Information technology journal, 6, 5, 718-724, (2007)
[8] Joung, J.G.; O, S.J.; Zhang, B.T., Protein sequence-based risk classification for human papillomaviruses, Computers in biology and medicine, 36, 656-667, (2006)
[9] Tuikkala, J.; Elo, L.L.; Nevalainen, O.S.; Aittokallio, T., Missing value imputation improves clustering and interpretation of gene expression microarray data, BMC bioinformatics, 9, 202, (2008)
[10] Farhangfar, A.; Kurgan, L.; Dy, J., Impact of imputation of missing values on classification error for discrete data, Pattern recognition, 41, 12, 3692-3705, (2008) · Zbl 1173.68479
[11] Ouyang, M.; Welsh, W.J.; Georgopoulos, P., Gaussian mixture clustering and imputation of microarray data, Bioinformatics, 20, 917-923, (2004)
[12] Y.H. Yang, M.J. Buckley, S. Dudoit, T.P. Speed, Comparison of methods for image analysis in cDNA microarray data, Technical Report 584, Department of Statistics, UC, Berkeley, 2000.
[13] Troyanskayaet, O.; Cantor, M.; Sherlock, G., Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 6, 520-525, (2001)
[14] Tuikkala, J.; Elo, L.; Nevalainen, O.S.; Aittokallio, T., Improving missing value estimation in microarray data with gene ontology, Bioinformatics, 22, 566-572, (2006)
[15] Xiang, Q.; Dai, X.H.; Deng, Y.Y., Missing value imputation for microarray gene expression data using histone acetylation information, BMC bioinformatics, 9, 252, (2008)
[16] Oba, S.; Sato, M.A.; Takemasa, I., A Bayesian missing value estimation method for gene expression profile data, Bioinformatics, 19, 2088-2096, (2003)
[17] Kim, H.; Golub, G.H.; Park, H., Missing value estimation for DNA microarray gene expression data: local least squares imputation, Bioinformatics, 21, 187-198, (2005)
[18] Bø, T.H.; Dysvik, B.; Jonassen, I., Lsimpute: accurate estimation of missing values in microarray data with least squares methods, Nucleic acids research, 32, 3, (2004)
[19] Yoon, D.; Lee, E.K.; Park, T., Robust imputation method for missing values in microarray data, BMC bioinformatics, 8, 2, (2007)
[20] Sehgal, M.S.B.; Gondal, I.; Dooley, L.S., Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data, Bioinformatics, 21, 2417-2423, (2005)
[21] Gan, X.C.; Liew, A.W.C.; Yan, H., Microarray missing data imputation based on a set theoretic framework and biological knowledge, Nucleic acids research, 34, 5, (2006)
[22] Jornsten, R.; Wang, H.Y.; Welsh, W.J.; Ouyang, M., DNA microarray data imputation and significance analysis of differential expression, Bioinformatics, 21, 4155-4161, (2005)
[23] Choong, M.K.; Charbit, M.; Yan, H., Autoregressive-model-based missing value estimation for DNA microarray time series data, IEEE transaction on information technology in biomedicine, 13, 131-137, (2009)
[24] Fan, J.Q., Design-adaptive nonparametric regression, Journal of the American statistical association, 87, 420, 998-1004, (1992) · Zbl 0850.62354
[25] Fan, J.Q., Local linear regression smoothers and their minimax efficiencies, Annals of statistics, 21, 1, 196-216, (1993) · Zbl 0773.62029
[26] Spellman, P.T.; Sherlock, G.; Zhang, M.Q., Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization, Molecular biology of the cell, 9, 3273-3297, (1998)
[27] DeRisi, J.L.; Iyer, V.R.; Brown, P.O., Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686, (1997)
[28] Gasch, A.P.; Spellman, P.T.; Kao, C.M., Genomic expression programs in the response of yeast cells to environmental changes, Molecular biology of the cell, 11, 4241-4257, (2000)
[29] Hirao, M.; Posakony, J.; Nelson, M.; Hruby, H.; Jung, M.; Simon, J.A.; Bedalov, A., Identification of selective inhibitors of nad^+-dependent deacetylases using phenotypic screens in yeast, Journal of biological chemistry, 278, 52773-52782, (2003)
[30] Alizadeh, A.A.; Eisen, M.B.; Davis, R.E.; Ma, C., Distinct types of diffuse large B-cell lymphoma identified by gene-expression profiling, Nature, 403, 503-511, (2000)
[31] Ronen, M.; Botstein, D., Transcriptional response of steady-state yeast cultures to transient perturbations in carbon source, Pnas, 103, 389-394, (2005)
[32] Brock, G.N.; Shaffer, J.R.; Blakesley, R.E.; Lotz, M.J.; Tseng, G.C., Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes, BMC bioinformatics, 9, 12, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.