×

A composite approach to protein tertiary structure prediction: hidden Markov model based on lattice. (English) Zbl 1415.92139

Summary: The biological function of protein depends mainly on its tertiary structure which is determined by its amino acid sequence via the process of protein folding. Prediction of protein structure from its amino acid sequence is one of the most prominent problems in computational biology. Two basic methodologies on protein structure prediction are combined: ab initio method (3-D space lattice) and fold recognition method (hidden Markov model). The primary structure of proteins and 3-D coordinates of amino acid residues are put together in one hidden Markov model to learn the path of amino acid residues in 3-D space from the first atom to the last atom of each protein of each fold. Therefore, each model has the information of 3-D path of amino acids of each fold. The proposed method is compared to fold recognition methods which have hidden Markov model as a base of their algorithms having approaches on only amino acid sequence or secondary structure. To validate the proposed method, the models are assessed with three datasets. Results show that the proposed models outperform 7-HMM and 3-HMM in the same dataset. The face-centered cubic lattice which is the most compacted 3-D lattice reached the maximum classification accuracy in all experiments in comparison with the performance of the most effective version of optimized 3-HMM as well as the performance of the latest version of SAM 3.5. Results show that 3-D coordinates of atoms of amino acids in proteins have an important role in prediction. It also has great hidden information as compared to secondary structure of proteins in fold classification.

MSC:

92D20 Protein sequences, DNA sequences
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

HMMER
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bahamish HAA, Abdullah R, Salam RA (2009) Protein tertiary structure prediction using artificial bee colony algorithm. In: Third Asia international conference on modelling & simulation, pp 258-263
[2] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H et al (2000) The Protein Data Bank. Nucl Acids Res 28:235-242
[3] Bidargaddi NP, Chetty M, Kamruzzaman J (2009) Combining segmental semi-Markov models with neural networks for protein secondary structure prediction. Neurocomputing 72:3943-3950
[4] Camproux AC, Tufféry P (2005) Hidden Markov Model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. Biochem Biophys Acta 1724:394-403
[5] Caoa H, Ihma Y, Wangb C-Z, Morrisb JR, Sua M, Dobbsc D et al (2004) Three-dimensional threading approach to protein structure recognition. Polymer 45:687-697
[6] Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M et al (2004) The ASTRAL Compendium in 2004. Nucleic Acids Res 32:D189-D192
[7] Chmielnicki W, Stapor K (2012) A hybrid discriminative/generative approach to protein fold recognition,”. Neurocomputing 75:194-198
[8] Deschavanne P, Tufféry P (2009) Enhanced protein fold recognition using a structural alphabet. Proteins 76:129-137
[9] Dorn M, Silva MB, Buriol LS, Lamb LC (2014) Three-dimensional protein structure prediction: methods and computational strategies. Comput Biol Chem 53:251-276
[10] Dotu I, Cebrian M, Van Hentenryck P, Clote P (2011) On lattice protein structure prediction revisited. IEEE/ACM Trans Comput Biol Bioinform 8:1620-1632
[11] Elofsson A, Hargbo J (1999) Hidden Markov models that use predicted secondary structures for fold recognition. Proteins 36:68-76
[12] Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucl Acids Res 39:W29-W37
[13] Fox NK, Brenner SE, Chandonia JM (2015) The value of protein structure classification information-Surveying the scientific literature. Proteins Struct Funct Bioinform 83:2025-2038
[14] Gheraibia Y, Moussaoui A (2012) Prediction of 3D protein structure using a genetic algorithm and a K nearest neighbour classifier. In: Biomedical engineering international conference BIOMEIC’12, Algeria
[15] Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K (2003) Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 51:504-514
[16] Karplus K, Sjölander K, Barrett C, Cline M, Haussler D, Hughey R et al (1997) Predicting protein structure using hidden Markov models. Proteins Struct Funct Bioinform 29:134-139
[17] Karplus K, Karchin R, Shackelford G, Hughey R (2005) Calibrating E-values for hidden Markov models using reverse-sequence null models. Bioinformatics 21:4107-4115
[18] Kong L, Zhang L (2014) Novel structure-driven features for accurate prediction of protein structural class. Genomics 103:292-297
[19] Lampros C, Papaloukas C, Exarchos TP, Goletsis Y, Fotiadis DI (2007a) Sequence-based protein structure prediction using a reduced state-space hidden Markov model. Comput Biol Med 37:1211-1224
[20] Lampros C, Papaloukas C, Exarchos K (2007b) Improvement in fold recognition accuracy of a reduced-state-space hidden Markov model by using secondary structure information in scoring. In: 29th annual international conference of the IEEE EMBS, France
[21] Lampros C, Papaloukas C, Exarchos K, Fotiadis DI, Tsalikakis D (2009) Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model. Comput Biol Med 39:907-914
[22] Lampros C, Simos T, Exarchos TP, Exarchos KP, Papaloukas C, Fotiadis DI (2014) Assessment of optimized Markov models in protein fold classification. J Bioinform Comput Biol 12(4):1450016. https://doi.org/10.1142/S0219720014500164 · doi:10.1142/S0219720014500164
[23] Lampros C, Papaloukas C, Exarchos T, Fotiadis DI (2017) HMMs in Protein Fold Classification. Hidden Markov Models Methods Mol Biol 1552:13-27
[24] Lee J, Kim S-Y, Joo K, Kim I, Lee J (2004) Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins Struct Funct Bioinform 56:704-714
[25] Lee SY, Lee JY, Jung KS, Ryu KH (2009) A 9-state hidden Markov model using protein secondary structure information for protein fold recognition. Comput Biol Med 39:527-534
[26] Lin C-J, Su S-C (2011) Protein 3D HP model folding simulation using a hybrid of genetic algorithm and particle swarm optimization. Int J Fuzzy Syst 13:140-147
[27] Márquez-Chamorro AE, Divina F, Aguilar-Ruiz JS, Bacardit J, Asencio-Cortés G, Santiesteban-Toca CE (2012) A NSGA-II algorithm for the residue-residue contact prediction. Springer, Berlin, pp 234-244
[28] Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536-540
[29] Nanni L, Brahnamc S, Lumini A (2014) Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J Theor Biol 360:109-116 · Zbl 1343.92387
[30] Pitteri M, Zanzotto G (1996) On the definition and classification of Bravais lattices. Acta Cryst A52:830-838 · Zbl 1188.52012
[31] Rashid MA, Newton MAH, Hoque MT, Sattar A (2013a) Mixing energy models in genetic algorithms for on-lattice protein structure prediction. BioMed Res Int 27:37-52
[32] Rashid MA, Newton MAH, Hoque MT, Sattar A (2013b) A local search embedded genetic algorithm for simplified protein structure prediction. 2013 IEEE congress on evolutionary computation. https://doi.org/10.1109/CEC.2013.6557688 · doi:10.1109/CEC.2013.6557688
[33] Regad L, Guyon F, Maupetit J, Tufféry P, Camproux AC (2008) A Hidden Markov Model applied to the protein 3D structure analysis. Comput Stat Data Anal 52:3198-3207 · Zbl 1452.62842
[34] Shi J-Y, Zhang Y-N (2010) Using hierarchical hidden Markov models to perform sequence-based classification of protein structure. In: IEEE 10th international conference on signal processing, Beijing, pp 1789-1792
[35] Song NY, Yan H (2013) Autoregressive and iterative hidden Markov models for periodicity detection and solenoid structure recognition in protein sequences. IEEE J Biomed Health Inform 17:436-441
[36] Stanfel LE (1996) A new approach to clustering the amino acids. J Theor Biol 183:195-205
[37] Tan C-W, Jones DT (2008) Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction. BMC Bioinform 94:19-42
[38] Valavanis I, Spyrou G, Nikita K (2010) A similarity network approach for the analysis and comparison of protein sequence/structure sets. J Biomed Inform 43:257-267
[39] Yang Y, Faraggi E, Zhao H, Zhou Y (2011) Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Struct Bioinform 27:2076-2082
[40] Yoon B-J (2009) Hidden Markov models and their applications in biological sequence analysis. Curr Genom 10:402-415
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.