Using persistent homology and dynamical distances to analyze protein binding.

*(English)*Zbl 1343.92380Summary: Persistent homology captures the evolution of topological features of a model as a parameter changes. The most commonly used summary statistics of persistent homology are the barcode and the persistence diagram. Another summary statistic, the persistence landscape, was recently introduced by Bubenik. It is a functional summary, so it is easy to calculate sample means and variances, and it is straightforward to construct various test statistics. Implementing a permutation test we detect conformational changes between closed and open forms of the maltose-binding protein, a large biomolecule consisting of 370 amino acid residues. Furthermore, persistence landscapes can be applied to machine learning methods. A hyperplane from a support vector machine shows the clear separation between the closed and open proteins conformations. Moreover, because our approach captures dynamical properties of the protein our results may help in identifying residues susceptible to ligand binding; we show that the majority of active site residues and allosteric pathway residues are located in the vicinity of the most persistent loop in the corresponding filtered Vietoris-Rips complex. This finding was not observed in the classical anisotropic network model.

##### MSC:

92D20 | Protein sequences, DNA sequences |

92B15 | General biostatistics |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

##### Keywords:

dynamical distance; persistence landscape; persistent homology; simplicial complex; support vector machine##### References:

[1] | Ahmad, S., M. Gromiha, H. Fawareh and A. Sarai (2004): “Asa-view: solvent acesitiblity graphics for proteins,” Available at . Accessed on December 3, 2011. |

[2] | Amitai, G., A. Shemesh, E. Sitbon, M. Shklar, D. Netanely, I. Venger and S. Pietrokovski (2004): “network analysis of protein structures identifies functional residues,” J. Mol. Biol., 344, 1135-1146. |

[3] | Atilgan, A. R., S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin and I. Bahar (2001): “Anisotropy of fluctuation dynamics of proteins with an elastic network model,” Biophys. J., 80, 505-515. |

[4] | Bandulasiri, A., R. N. Bhattacharya and V. Patrangenaru (2009): “Nonparametric inference for extrinsic means on size-and-(reflection)-shape manifolds with applications in medical imaging,” J. Multivariate Anal., 100, 1867-1882. · Zbl 1171.62028 |

[5] | Bendich, P., T. Galkovskyi and J. Harer (2011): “Improving homology estimates with random walks,” Inverse Probl., 27, 124002. · Zbl 1247.68303 |

[6] | Bernstein, F. C., T. F. Koetzle, G. J. Williams, E. E. Meyer Jr., M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi and M. Tasumi (1977): “The protein data bank: a computer-based archival file for macromolecular structures,” J. Mol. Biol., 112, 535. |

[7] | Bhattacharya, A. (2008): “Statistical analysis on manifolds: a nonparametric approach for inference on shape spaces,” Sankhya Ser. A., 70-A, 223-266. · Zbl 1193.62079 |

[8] | Bhattacharya, R. and V. Patrangenaru (2003): “Large sample theory of intrinsic and extrinsic sample means on manifolds I,” Ann. Stat., 31, 1-29. · Zbl 1020.62026 |

[9] | Boos, W. and H. Shuman (1998): “Maltose/maltodextrin system of Escherichia coli: transport, metabolism, and regulation,” Microbiol. Mol. Biol. Rev., 62, 204-229. |

[10] | Bradley, M. J., P. T. Chivers and N. A. Baker (2008): “Molecular dynamics simulation of the Escherichia coli NikR protein: equilibrium conformational fluctuations reveal interdomain allosteric communication pathways,” J. Mol. Biol., 378, 1155-1173. |

[11] | Bubenik, P. (2015): “Statistical topological data analysis using persistence landscapes,” J. Mach. Learn. Res., 16, 77-102. · Zbl 1337.68221 |

[12] | Bubenik, P., G. Carlsson, P. T. Kim and Z.-M. Luo (2010): “Statistical topology via Morse theory persistence and nonparametric estimation.” In: Algebraic methods in statistics and probability II, Contemp. Math., volume 516, Providence, RI: Amer. Math. Soc., 75-92. · Zbl 1196.62041 |

[13] | Bubenik, P. and J. Scott (2014): “Categorification of persistent homology,” Discrete Comput. Geom., 51, 600-627. · Zbl 1295.55005 |

[14] | Cavasotto, C. N., J. A. Kovacs and R. A. Abagyan (2005): “Representing receptor flexibility in ligand docking through relevant normal modes,” J. Am. Chem. Soc., 127, 9632-9640. |

[15] | Chazal, F., B. Fasy, F. Lecci, A. Rinaldo, A. Singh and L. Wasserman (2014a): “On the bootstrap for persistence diagrams and landscapes,” Model. Anal. Inform. Syst., 20, 96-105. |

[16] | Chazal, F., B. T. Fasy, F. Lecci, A. Rinaldo and L. Wasserman (2014b): “Stochastic convergence of persistence landscapes and silhouettes,” In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, SOCG’14, New York, NY, USA: ACM, 474-483. · Zbl 1395.62187 |

[17] | Collins, A., A. Zomorodian, G. Carlsson and L. J. Guibas (2004): “A barcode shape descriptor for curve point cloud data,” Comput. Graph., 28, 881-894. |

[18] | de Silva, V. and P. Perry (2005): “Plex: A MATLAB library for studying simplicial homology,” Available at . Accessed on January 19, 2012. |

[19] | Dijkstra, E. (1959): “A note on two problems in connexion with graphs,” Numer. Math., 1, 269-271. · Zbl 0092.16002 |

[20] | Dryden, I. L. and K. V. Mardia (1998): Statistical shape analysis, New York: John Wiley and Sons. · Zbl 0901.62072 |

[21] | Duan, X., J. A. Hall, H. Nikaido and F. A. Quiocho (2001): “Crystal structures of the maltodextrin/maltose-binding protein complexed with reduced oligosaccharides: flexibility of tertiary structure and ligand binding,” J. Mol. Biol., 306, 1115-1126. |

[22] | Duan, X. and F. A. Quiocho (2002): “Structural evidence for a dominant role of nonpolar interactions in the binding of a transport/chemosensory receptor to its highly polar ligands,” Biochemistry, 41, 706-712. |

[23] | Edelsbrunner, H. and J. Harer (2010): Computational Topology An Introduction, Providence Rhode Island: American Mathematical Society. · Zbl 1193.55001 |

[24] | Edelsbrunner, H., D. Letscher and A. Zomorodian (2002): “Topological persistence and simplifi-cation,” Discrete Comput. Geom., 28, 511-533. · Zbl 1011.68152 |

[25] | Eyal, E., L.-W. Yang, I. Bahar and A. Tramontano (2006): “Anisotropic network model: systematic evaluation and a new web interface,” Bioinformatics, 22, 2619-2627. |

[26] | Fasy, B. T., F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan and A. Singh (2014): “Confi-dence sets for persistence diagrams,” Ann. Statist., 42, 2301-2339. · Zbl 1310.62059 |

[27] | Gameiro, M., Y. Hiraoka, S. Izumi, M. Kramar, K. Mischaikow and V. Nanda (2012): “Topological measurement of protein compressibility via persistence diagrams,” In The Global COE Program, MI Preprint Series, volume 6, Math for Industry Education & Research Hub, Fukuoka, Japan: Kyushu University, MI Preprint Series, volume 6, 1-10. · Zbl 1320.55004 |

[28] | Gekko, K. and Y. Hasegawa (1986): “Compressibility-structure relationship of globular proteins,” Biochemistry, 25, 6563-6571. |

[29] | Gould, A. D. and B. H. Shilton (2010): “Studies of the maltose transport system reveal a mechanism for coupling ATP hydrolysis to substrate translocation without direct recognition of substrate,” J. Biol. Chem., 285, 11290-11296. |

[30] | Hatcher, A. (2002): Algebraic topology, Cambridge: Cambridge University Press. · Zbl 1044.55001 |

[31] | Heo, G., J. Gamble and P. T. Kim (2012): “Topological analysis of variance and the maxillary complex,” J. Am. Stat. Assoc., 107, 477-492. · Zbl 1261.62096 |

[32] | HKF (2013): “How to plot a hyper plane in 3D for the SVM results?” . Accessed on November 14, 2013. |

[33] | Hudault, S., J. Guignot and A. L. Servin (2001): “Escherichia coli strains colonising the gastrointestinal tract protect germfree mice against Salmonella typhimurium infection,” Gut, 49, 47-55. |

[34] | Inkscape (2010): “Inkscape: open source vector graphics editor,” Free Software Foundation, Inc., Available at . |

[35] | Kasahara, K., I. Fukuda and H. Nakamura (2014): “A novel approach of dynamic cross correlation analysis on molecular dynamics simulations and its application to Ets1 dimer-DNA complex,” PLoS ONE, 9, e112419. |

[36] | Kobryn, A. E., D. Nikolić, O. Lyubimova, S. Gusarov and A. Kovalenko (2014): “Dissipative particle dynamics with an effective pair potential from integral equation theory of molecular liquids,” J. Phys. Chem. B, 118, 12034-12049. |

[37] | Kovacev-Nikolic, V. (2012): Persistent homology in analysis of point-cloud data, Master’s thesis, Department of Mathematical and Statistical Sciences, University of Alberta. |

[38] | Ledoux, M. and M. Talagrand (2002): Probability in Banach spaces: isoperimetry and processes, A Series of Modern Surveys in Mathematics Series, Springer, first reprint 2002 edition. |

[39] | Lockless, S. W. and R. Ranganathan (1999): “Evolutionarily conserved pathways of energetic connectivity in protein families,” Science, 286, 295-299. |

[40] | Marvin, J. S., E. E. Corcoran, N. A. Hattangadi, J. V. Zhang, S. A. Gere and H. W. Hellinga (1997): “The rational design of allosteric interactions in a monomeric protein and its applications to the construction of biosensors,” P. Natl. Acad. Sci., 94, 4366-4371. |

[41] | MATLAB (2005): “Matlab release 14,” The MathWorks Inc., Natick, Massachusetts, USA. |

[42] | MATLAB (2011): “Matlab and statistics toolbox release 2011a,” The MathWorks Inc., Natick, Massachusetts, USA. |

[43] | McNaught, A. D. and A. Wilkinson (1997): IUPAC compendium of chemical terminology, 2nd ed., Oxford: Blackwell Scientific Publications. |

[44] | Mileyko, Y., S. Mukherjee and J. Harer (2011): “Probability measures on the space of persistence diagrams,” Inverse Probl., 27, 124007. · Zbl 1247.68310 |

[45] | Morris, G. M., R. Huey, W. Lindstrom, M. F. Sanner, R. K. Belew, D. S. Goodsell and A. J. Olson (2009): “Autodock4 and autodocktools4: automated docking with selective receptor flexiblity,” J. Comp. Chem., 15, 2785-2791. |

[46] | Nikolić, D., N. Blinov, D. Wishart and A. Kovalenko (2012): “3d-rism-dock: a new fragment-based drug design protocol,” J. Chem. Theory Comput., 8, 3356-3372. |

[47] | Nikolić, D. and V. Kovacev-Nikolic (2013): “Dynamical model of the maltose-binding protein,” unpublished, 11 pages, Research Gate. DOI: 10.13140/2.1.3269.8883. |

[48] | Quiocho, F. A., J. C. Spurlino and L. E. Rodseth (1997): “Extensive features of tight oligosaccha-ride binding revealed in high-resolution structures of the maltodextrin transport/chemosensory receptor,” Structure, 5, 997-1015. |

[49] | R Development Core Team (2008): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL , ISBN 3-900051-07-0. |

[50] | Reininghause, J., S. Huber, U. Bauer and R. Kwitt (2015): “A stable multi-scale kernel for topological machine learning,” In: Proc. 2015 IEEE Conf. Comp. Vision & Pat. Rec. (CVPR ’15), Boston, MA, USA, 4741-4748. |

[51] | Rizk, S. S., M. Paduch, J. H. Heithaus, E. M. Duguid, A. Sandstrom and A. A. Kossiakoff (2011): “Allosteric control of ligand-binding affinity using engineered conformation-specific effector proteins,” Nat. Struct. Mol. Biol., 18, 437-442. |

[52] | Rubin, S. M., S.-Y. Lee, E. J. Ruiz, A. Pines and D. E. Wemmer (2002): “Detection and characterization of xenon-binding sites in proteins by 129Xe NMR spectroscopy,” J. Mol. Biol., 322, 425-440. |

[53] | Seeliger, D. and B. L. de Groot (2010): “Conformational transitions upon ligand binding: holo-structure prediction from apo conformations,” PLoS Comput. Biol., 6, e1000634. |

[54] | Sharff, A. J., L. E. Rodseth, J. C. Spurlino and F. A. Quiocho (1992): “Crystallographic evidence of a large ligand-induced hinge-twist motion between the two domains of the maltodextrin binding protein involved in active transport and chemotaxis,” Biochemistry, 31, 10657-10663. |

[55] | Shilton, B. H., H. A. Shuman and S. L. Mowbray (1996): “Crystal structures and solution conformations of a dominant-negative mutant of Escherichia coli maltose-binding protein,” J. Mol. Biol., 264, 364-376. |

[56] | Szmelcman, S., M. Schwartz, T. J. Silhavy and W. Boos (1976): “Maltose transport in Escherichia coli K12,” Eur. J. Biochem., 65, 13-19. |

[57] | Tamal, K. D., S. Jian and W. Yusu (2011): “Approximating cycles in a shortest basis of the first homology group from point data,” Inverse Probl., 27, 124004. · Zbl 1247.68308 |

[58] | Tang, S., J.-C. Liao, A. R. Dunn, R. B. Altman, J. A. Spudich and J. P. Schmidt (2007): “Predicting allosteric communication in myosin via a pathway of conserved residues,” J. Mol. Biol., 373, 1361-1373. |

[59] | Tausz, A., M. Vejdemo-Johansson and H. Adams (2011): “JavaPlex: a research software package for persistent (co)homology,” Available at . · Zbl 1402.65186 |

[60] | Tenenbaum, J. B., V. de Silva and J. C. Langford (2000): “Isomap: a global geometric framework for nonlinear dimensionality reduction,” Science, 290, 2319-2323. |

[61] | Tobi, D. and I. Bahar (2005): “Structural changes involved in protein binding correlate with intrinsic motions of proteins in the unbound state,” P. Natl. Acad. Sci. USA 102, 18908-18913. |

[62] | Turner, K., Y. Mileyko, S. Mukherjee and J. Harer (2014): “Fréchet means for distributions of persistence diagrams,” Discrete Comput. Geom., 52, 44-70. · Zbl 1296.68182 |

[63] | Van Houdt, R. and C. W. Michiels (2005): “Role of bacterial cell surface structures in Escherichia coli biofilm formation,” Res. Microbiol., 156, 626-633. |

[64] | Xia, K. and G.-W. Wei (2014): “Persistent homology analysis of protein structure, flexibility, and folding,” Int. J. Numer. Meth. Biomed. Eng., 30, 814-844. |

[65] | Xia, K. and G.-W. Wei (2015a): “Multidimensional persistence in biomolecular data,” J. Comput. Chem., 36, 1502-1520. |

[66] | Xia, K. and G.-W. Wei (2015b): “Persistent topology for cryo-EM data analysis,” Int. J. Numer. Meth. Biomed. Engng., 31. Doi: 10.1002/cnm.2719. |

[67] | Zomorodian, A. and G. Carlsson (2005): “Computing persistent homology,” Discrete Comput. Geom., 33, 249-274. · Zbl 1069.55003 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.