A likelihood-based scoring method for peptide identification using mass spectrometry. (English) Zbl 1257.62106

Summary: Mass spectrometry provides a high-throughput approach to identify proteins in biological samples. A key step in the analysis of mass spectrometry data is to identify the peptide sequence that, most probably, gave rise to each observed spectrum. This is often tackled using a data base search: each observed spectrum is compared against a large number of theoretical “expected” spectra predicted from candidate peptide sequences in a data base, and the best match is identified using some heuristic scoring criterion.
We provide a more principled, likelihood-based, scoring criterion for this problem. Specifically, we introduce a probabilistic model that allows one to assess, for each theoretical spectrum, the probability that it would produce the observed spectrum. This probabilistic model takes account of peak locations and intensities, in both observed and theoretical spectra, which enables incorporation of detailed knowledge of chemical plausibility in peptide identification. Besides placing peptide scoring on a sounder theoretical footing, the likelihood-based score also has important practical benefits: it provides natural measures for assessing the uncertainty of each identification, and in comparisons on benchmark data it produced more accurate peptide identifications than other methods, including SEQUEST. Although we focus here on peptide identification, our scoring rule could easily be integrated into any downstream analyses that require peptide-spectrum match scores.


62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
92C05 Biophysics
Full Text: DOI arXiv Euclid


[1] Coon, J. J., Syka, J. E., Shabanowitz, J. and Hunt, D. (2005). Tandem mass spectrometry for peptide and proteins sequence analysis. BioTechniques 38 519-521.
[2] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[3] Dongre, A. R., Johns, J. L., Somogyi, A. and Wysocki, V. (1996). Influence of peptide composition, gass-phase basicity, and chemical modification on fragmentation efficiency: Evidence for the mobile proton model. J. Am. Chem. Soc. 118 8365-8374.
[4] Elias, J. and Gygi, S. (2007). Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods 4 207-214.
[5] Eng, J., McCormack, A. and Yates, J. I. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom 5 976-989.
[6] Fenyo, D. and Beavis, R. (2003). A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 75 768-774.
[7] Gerster, S., Qeli, E., Ahrens, C. H. and Buehlmann, P. (2010). Protein and gene model inference based on statistical modeling in k-partite graphs. Proc. Natl. Acad. Sci. USA 107 12101-12106.
[8] Hernandez, P., Muller, M. and Appel, R. D. (2006). Automated protein identification by tandem mass spectrometry: Issues and strategies. Mass Spectrometry Reviews 25 235-254.
[9] Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R. and Kolker, E. (2002a). Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6 207-212.
[10] Keller, A., Nesvizhskii, A., Kolker, E. and Aebersold, R. (2002b). Empirical statistical model to estimate the accuracy of peptide identifications made by ms/ms and database search. Anal. Chem. 74 5383-5392.
[11] Kinter, M. and Sherman, N. E. (2000). Protein Sequencing and Identification Using Tandem Mass Spectrometry . Wiley, New York.
[12] Klammer, A. A., Park, C. Y. and Noble, W. S. (2009). Statistical calibration of the SEQUEST XCorr function. Journal of Proteome Research 8 2106-2113.
[13] Klammer, A. A., Reynolds, S., MacCoss, M. J., Bilmes, J. and Noble, W. (2008). Modelling peptide fragmentation with dynamic Bayesian networks for peptide identification. Bioinformatics 24 i348-i356.
[14] Lam, H., Deutsch, E. W., Eddes, J. S., Eng, J. K., King, N., Stein, S. E. and Aebersold, R. (2007). Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7 655-667. · Zbl 1245.68079 · doi:10.4310/SII.2012.v5.n1.a4
[15] Li, Q., Eng, J. K. and Stephens, M. (2012). Supplement to “A likelihood-based scoring method for peptide identification using mass spectrometry.” . · Zbl 1257.62106
[16] Li, Q., MacCoss, M. J. and Stephens, M. (2010). A nested mixture model for protein identification using mass spectrometry. Ann. Appl. Stat. 4 962-987. · Zbl 1194.62118 · doi:10.1214/09-AOAS316
[17] Nesvizhskii, A. I. and Aebersold, R. (2004). Analysis, statistical validation and dissermination of large-scale proteomics datasets generated by tandem ms. Drug Discovery Today 9 173-181.
[18] Nesvizhskii, A., Keller, A., Kolker, E. and Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75 4646-4653.
[19] Sadygov, R., Liu, H. and Yates, J. (2004). Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases. Anal. Chem. 76 1664-1671.
[20] Shen, C., Wang, Z., Shankar, G., Zhang, X. and Li, L. (2008). A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry. Bioinformatics 24 202-208. · Zbl 1254.92006
[21] Sun, S., Meyer-Arendt, K., Eichelberger, B., Brown, R., Yen, C., Old, W., Pierce, K., Cios, K., Ahn, N. G. and Resing, K. A. (2007). Improved validation of peptide ms/ms assignments using spectral intensity prediction. Molecular and Cellular Proteomics 6 1-17.
[22] Wan, Y., Yang, A. and Chen, T. (2006). PepHMM: A hidden Markov model based scoring function for mass spectrometry database search. Anal. Chem. 78 432-437.
[23] Wysocki, V. H., Tsaprsilis, G., Smith, L. and Breci, L. A. (2000). Mobile and localized protons: A framework for understanding peptide dissociation. J. Mass Spectrom. 35 1399-1406.
[24] Yen, C., Houel, S., Ahn, N. G. and Old, W. (2011). Spectrum-to-spectrum searching using a proteome-wide spectral library. Mol. Cell. Proteomics 10 M111.007666.
[25] Yu, W., Taylor, J. A., Davis, M. T., Bonilla, L. E., Lee, K. A., Auger, P. L., Farnsworth, C. C., Welcher, A. A. and Patterson, S. D. (2010). Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines. Proteomics 10 1172-1189.
[26] Zhang, Z. (2004). Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76 3908-3922.
[27] Zhang, Z. (2005). Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. Anal. Chem. 77 6364-6373.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.