zbMATH — the first resource for mathematics

A preconditioned conjugate gradient algorithm for GeneRank with application to microarray data mining. (English) Zbl 1260.68351
Summary: The problem of identifying key genes is of fundamental importance in biology and medicine. The GeneRank model explores connectivity data to produce a prioritization of the genes in a microarray experiment that is less susceptible to variation caused by experimental noise than the one based on expression levels alone. The GeneRank algorithm amounts to solving an unsymmetric linear system. However, when the matrix in question is very large, the GeneRank algorithm is inefficient and even can be infeasible. On the other hand, the adjacency matrix is symmetric in the GeneRank model, while the original GeneRank algorithm fails to exploit the symmetric structure of the problem in question. In this paper, we discover that the GeneRank problem can be rewritten as a symmetric positive definite linear system, and propose a preconditioned conjugate gradient algorithm to solve it. Numerical experiments support our theoretical results, and show superiority of the novel algorithm.

68T05 Learning and adaptive systems in artificial intelligence
92D10 Genetics and epigenetics
Full Text: DOI
[1] Aerts S et al (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24: 537–544 · doi:10.1038/nbt1203
[2] Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. Proc LSS Comput Syst Bioinform Conf 8: 37–46
[3] Bai Z, Demmel J, Dongarra J, Ruhe A, van der Vorst H (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. SIAM, Philadelphia · Zbl 0965.65058
[4] Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182: 418–477 · Zbl 1015.65018 · doi:10.1006/jcph.2002.7176
[5] Cipra B (2000) The best of the 20th Century: editors name top 10 algorithms. SIAM News 33(4)
[6] Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia · Zbl 0879.65017
[7] Franke L et al (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78: 1011–1025 · doi:10.1086/504300
[8] Freschi V (2007) Protein function prediction from interaction networks using a random walk ranking algorithm. IEEE international conference on bioinformatics and bioengineering, pp 42–48
[9] Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore, London · Zbl 0865.65009
[10] Hai D, Lee W, Thuy H (2008) A PageRanking based method for identifying chracteristic genes of a disease. IEEE Int Conf Netw Sens Control 6–8: 1496–1499
[11] Hestenes M, Stiefel E (1952) Methods of conjugate gradients for linear systems. J Res Natl Bur Stand 49: 409–436 · Zbl 0048.09901 · doi:10.6028/jres.049.044
[12] Jia Z (1997) Refined iterative algorithms based on Arnoldi’s process for large unsymmetric eigenproblems. Linear Algebra Appl 259: 1–23 · Zbl 0877.65018 · doi:10.1016/S0024-3795(96)00238-8
[13] Ma X, Lee H, Wang L, Sun F (2007) CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data. Bioinformatics 23(2): 215–221 · doi:10.1093/bioinformatics/btl569
[14] Morrison J, Breitling R, Higham D, Gilbert D (2005) GeneRank: using search engine for the analysis of microarray experiments. BMC Bioinform 6: 233–246 · doi:10.1186/1471-2105-6-233
[15] Page L, Brin S, Motwami R, Winograd T (1998) The PageRank citation ranking: bring order to the web, Technical report. Computer Science Department, Stanford University, Palo Alto
[16] Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia · Zbl 1031.65046
[17] Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3: 1–13
[18] Taylor A, Higham D (2008) CONTEST: a controllable test matrix toolbox for MATLAB. ACM Trans Math Softw 35, Article 26
[19] The MATHWORKS, Inc. (2004) MATLAB 7. September
[20] Wu G, Zhang Y, Wei Y (2010) Krylov subspace algorithms for computing GeneRank for the analysis of microarray data mining. J Comput Biol 17: 631–646 · doi:10.1089/cmb.2009.0004
[21] Wu G, Zhang Y, Wei Y (under review) Accelerating the Arnoldi-type algorithm for computing Google’s PageRank
[22] Xenarios I, Salwinski L, Duan X, Higney P, Kim S (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30: 303–305 · Zbl 05437358 · doi:10.1093/nar/30.1.303
[23] Yue B, Liang H, Bai F (2007) Understanding the GeneRank model. IEEE 1st Int Conf Bioinform Biomed Eng 6–8: 248–251
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.