Hung, Chun-Min; Huang, Yueh-Min; Chang, Ming-Shi Alignment using genetic programming with causal trees for identification of protein functions. (English) Zbl 1095.92030 Nonlinear Anal., Theory Methods Appl., Ser. A, Theory Methods 65, No. 5, 1070-1093 (2006). Summary: A hybrid evolutionary model is used to propose a hierarchical homology of protein sequences to identify protein functions systematically. The proposed model offers considerable potentials, considering the inconsistency of existing methods for predicting novel proteins. Because some novel proteins might align without meaningful conserved domains, maximizing the score of sequence alignment is not the best criterion for predicting protein functions. This work presents a decision model that can minimize the cost of making a decision for predicting protein functions using the hierarchical homologies. Particularly, the model has three characteristics: (i) it is a hybrid evolutionary model with multiple fitness functions that uses genetic programming to predict protein functions on a distantly related protein family, (ii) it incorporates modified robust point matching to accurately compare all feature points using the moment invariant and thin-plate spline theorems, and (iii) the hierarchical homologies holding up a novel protein sequence in the form of a causal tree can effectively demonstrate the relationship between proteins. This work describes the comparisons of nucleocapsid proteins from the putative polyprotein SARS virus and other coronaviruses in other hosts using the model. MSC: 92C40 Biochemistry, molecular biology 90C59 Approximation methods and heuristics in mathematical programming 92D20 Protein sequences, DNA sequences Keywords:bioinformatics protein databases; evolutionary computing and genetic algorithms; splines; invariants; moments Software:COMPASS; ClustalW; PSI-BLAST; BLAST PDFBibTeX XMLCite \textit{C.-M. Hung} et al., Nonlinear Anal., Theory Methods Appl., Ser. A, Theory Methods 65, No. 5, 1070--1093 (2006; Zbl 1095.92030) Full Text: DOI Link References: [1] Delcoigne, A.; Hansen, P., Sequence comparison by dynamic programming, Biometrika, 62, 661-664 (1975) · Zbl 0322.49020 [2] Krogh, A.; Mian, S.; Haussler, D., A hidden Markov model that finds genes in e. coli DNA, Nucleic Acids Res., 22, 4768-4778 (1994) [3] Krogh, A.; Brown, M.; Mian, I. S.; Sjolander, K.; Haussler, D., Hidden Markov models in computational biology: Applications to protein modeling, J. Mol. Biol., 235, 1501-1531 (1994) [4] Judea, P., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (1997), Morgan Kaufmann: Morgan Kaufmann Los Angeles [5] Cai, D.; Delcher, A.; Kao, B.; Kasif, S., Modeling splice sites with Bayes networks, Bioinformatics, 16, 152-159 (2000) [6] Chou, P. Y.; Fasman, G. D., Prediction of the secondary structure of proteins from their amino acid sequence, Adv. Enzymol. Relat. Areas Mol. Biol., 47, 45-148 (1978) [7] Smith, T. F.; Waterman, M. S., Identification of common molecular subsequences, J. Mol. Biol., 147, 195-197 (1981) [8] A. Tsakonas, G. Dounias, Hybrid computational intelligence schemes in complex domains: An extended review, in Proc. Methods and Applications of Artificial Intelligence: Second Hellenic Conference on AI (SETN), 2002, pp. 494-512; A. Tsakonas, G. Dounias, Hybrid computational intelligence schemes in complex domains: An extended review, in Proc. Methods and Applications of Artificial Intelligence: Second Hellenic Conference on AI (SETN), 2002, pp. 494-512 · Zbl 1065.68648 [9] Willett, P., Genetic algorithms in molecular recognition and design, Trends Biotechnol., 13, 497-537 (1995) [10] Yang, Z. R.; Thomson, R.; Hodgman, T. C.; Dry, J.; Doyle, A. K.; Narayanan, A.; Wu, X., Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function, Comput. Intell. Bioinformatics, 72, 159-176 (2003) [11] Dreiseitl, S.; Ohno-Machado, L., Logistic regression and artificial neural network classification models: a methodology review, J. Biomed. Informatics, 35, 352-360 (2002) [12] Coolen, A. C.C.; Prete, V. D., Statistical mechanics beyond the Hopfield model: solvable problems in neural network theory, Rev. Neurosci., 14, 181-193 (2003) [13] Jason, H. T.B.; Michael, P. Y., Applying fuzzy logic to medical decision making in the intensive care unit, Am. J. Respir. Crit. Care Med., 167, 948-952 (2003) [14] Ressom, H.; Reynolds, R.; Varghese, R. S., Increasing the efficiency of fuzzy logic-based gene expression data analysis, Physiol. Genomics, 13, 107-123 (2003) [15] Reichard, K.; Kaufmann, M., EPPS: mining the COG database by an extended phylogenetic patterns search, Bioinformatics, 19, 784-788 (2003) [16] Durbin, R.; Eddy, S.; Krogh, A.; Mitchison, G., Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids (1998), Cambridge University Press · Zbl 0929.92010 [17] Mittelman, D.; Sadreyev, R.; Grishin, N., Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments, Bioinformatics, 1531-1539 (2003) [18] Rychlewski, L.; Jaroszewski, L.; Li, W.; Godzik, A., Comparison of sequence profiles. Strategies for structural predictions using sequence information, Protein Sci., 9, 232-241 (2000) [19] Yona, G.; Levitt, M., Within the twilight zone: a sensitive profile-profile comparison tool based on information theory, J. Mol. Biol., 315, 1257-1275 (2002) [20] Sadreyev, R.; Grishin, N., COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance, J. Mol. Biol., 326, 317-336 (2003) [21] Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J., Basic local alignment search tool, J. Mol. Biol., 215, 403-410 (1990) [22] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402 (1997) [23] Thompson, J. D.; Higgins, D. G.; Gibson, T. J., CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice, Nucleic Acids Res., 22, 4673-4680 (1994) [24] H. Chui, A. Rangarajan, A new algorithm for non-rigid point matching, in: Computer Vision and Pattern Recognition, in: Proc. IEEE Conference, vol. 2, 2000, pp. 44-51; H. Chui, A. Rangarajan, A new algorithm for non-rigid point matching, in: Computer Vision and Pattern Recognition, in: Proc. IEEE Conference, vol. 2, 2000, pp. 44-51 [25] S. Meshoul, M. Batouche, Ant colony system with extremal dynamics for point matching and pose estimation, in: Proc. 16th International Conference, August 11-15, 2002, Pattern Recognit. 3 (2002) 823-826; S. Meshoul, M. Batouche, Ant colony system with extremal dynamics for point matching and pose estimation, in: Proc. 16th International Conference, August 11-15, 2002, Pattern Recognit. 3 (2002) 823-826 · Zbl 1017.68809 [26] Koza, J. R., Genetic Programming On the Programming of Computers by Means of Natural Selection (1992), Massachusetts Institute of Technology: Massachusetts Institute of Technology USA · Zbl 0850.68161 [27] Gold, S.; Rangarajan, A.; Lu, C. P.; Pappu, S.; Mjolsness, E., New algorithms for 2-D and 3-D point matching: pose estimation and correspondence, Pattern Recognit., 31, 1019-1031 (1998) [28] A.L. Delcher, S. Kasif, H.R. Goldberg, W.H. Hsu, Probabilistic prediction of protein secondary structure using causal networks, in: Proc. 11th AAAI National Conference on Artificial Intelligence, 1993, pp. 316-321; A.L. Delcher, S. Kasif, H.R. Goldberg, W.H. Hsu, Probabilistic prediction of protein secondary structure using causal networks, in: Proc. 11th AAAI National Conference on Artificial Intelligence, 1993, pp. 316-321 [29] Holmes, C. C.; Denison, D., Perfect sampling for wavelet reconstruction of signals, IEEE Trans. Signal Process., 50, 237-244 (2002) [30] Zheng, Q.; Chellappa, R., A computational vision approach to image registration, IEEE Trans. Image Process., 2, 313-326 (1993) [31] Steven, L. S.; David, B. S.; Simon, K., Computational Methods in Molecular Biology, vol. 32 (1998), Elsevier: Elsevier New York · Zbl 0898.00023 [32] Hu, M. K., Visual pattern recognition by moment invariants, IRE Trans. Inform. Theory, 8, 179-187 (1962) · Zbl 0102.13304 [33] Dresher, M., Moment spaces and inequalities, Duke Math. J., 20, 261-271 (1953) · Zbl 0050.28202 [34] Wahba, G., Spline Models for Observational Data (1990), SIAM: SIAM Philadelphia, PA · Zbl 0813.62001 [35] Riesz, F.; Sz-Nagy, B., Functional Analysis (1955), Frederick Ungar: Frederick Ungar New York [36] Sinkhorn, R., A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann. Math. Statist., 35, 876-879 (1964) · Zbl 0134.25302 [37] Peterson, C.; Soderberg, B., A new method for mapping optimization problems onto neural networks, Internat. J. Neural Systems, 1, 3-22 (1989) [38] Yuille, A. L.; Kosowsky, J. J., Statistical physics algorithms that converge, Neural Comput., 6, 341-356 (1994) · Zbl 0809.90110 [39] Rangarajan, A.; Gold, S.; Mjolsness, E., A novel optimizing network architecture with applications, Neural Comput., 8, 1041-1060 (1996) [40] Goldberg, D. E., Genetic and evolutionary algorithms come of age, Commun. ACM, 37, 113-119 (1994) [41] Koza, J. R.; Bennett, F. H.; Andre, D.; Keane, M. A., Genetic programming III: Darwinian invention and problem solving: Book Review, IEEE Trans., Evolutionary Comput., 3, 251-253 (1999) [42] Kushchu, I., Genetic programming and evolutionary generalization, IEEE Trans., Evolutionary Comput., 6, 431-442 (2002) [43] Wong, M. L.; Lam, W.; Leung, K. S.; Ngan, P. S.; Cheng, J. C.Y., Discovering knowledge from medical databases using evolutionary algorithms, IEEE Eng. Med. Biol. Mag., 19, 45-55 (2000) [44] Bojarczuk, C. C.; Lopes, H. S.; Freitas, A. A., Genetic programming for knowledge discovery in chest-pain diagnosis, IEEE Eng. Med. Biol. Mag., 19, 38-44 (2000) [45] J.R. Koza, F.H. Bennett, D. Andre, Classifying proteins as extracellular using programmatic motifs and genetic programming, in: Evolutionary Computation Proc., 1998. IEEE World Congress on Computational Intelligence, 1998, pp. 212-217; J.R. Koza, F.H. Bennett, D. Andre, Classifying proteins as extracellular using programmatic motifs and genetic programming, in: Evolutionary Computation Proc., 1998. IEEE World Congress on Computational Intelligence, 1998, pp. 212-217 [46] Fields, B. N.; Knipe, D. M.; Howley, P. M.; Griffin, D. E., Fields Virology (2001), Lippincott Williams & Wilkins: Lippincott Williams & Wilkins Philadelphia [47] Hall, P.; Titterington, D. M., Common structure of techniques for choosing smoothing parameters in regression problems, J. Roy. Statist. Soc. Ser. B, 49, 184-198 (1987) · Zbl 0633.62063 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.