A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data. (English) Zbl 1454.62531

Summary: We develop a novel peak detection algorithm for the analysis of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry GC\(\times\)GC-TOF MS data using normal-exponential-Bernoulli (NEB) and mixture probability models. The algorithm first performs baseline correction and denoising simultaneously using the NEB model, which also defines peak regions. Peaks are then picked using a mixture of probability distribution to deal with the co-eluting peaks. Peak merging is further carried out based on the mass spectral similarities among the peaks within the same peak group. The algorithm is evaluated using experimental data to study the effect of different cutoffs of the conditional Bayes factors and the effect of different mixture models including Poisson, truncated Gaussian, Gaussian, Gamma and exponentially modified Gaussian (EMG) distributions, and the optimal version is introduced using a trial-and-error approach. We then compare the new algorithm with two existing algorithms in terms of compound identification. Data analysis shows that the developed algorithm can detect the peaks with lower false discovery rates than the existing algorithms, and a less complicated peak picking model is a promising alternative to the more complicated and widely used EMG mixture models.


62P35 Applications of statistics to physics


MSeasy; PyMS
Full Text: DOI arXiv


[1] Castillo, S., Mattila, I., Miettinen, J., Orešič, M. and Hyötyläinen, T. (2011). Data analysis tool for comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry. Anal. Chem. 83 3058-3067.
[2] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[3] Dixon, S. J., Brereton, R. G., Soini, H. A., Novotny, M. V. and Penn, D. J. (2006). An automated method for peak detection and matching in large gas chromatography-mass spectrometry data sets. Journal of Chemometrics 20 325-340.
[4] Di Marco, V. B. and Bombi, G. G. (2001). Mathematical functions for the representation of chromatographic peaks. J. Chromatogr. A 931 1-30.
[5] Hrydziuszko, O. and Viant, M. R. (2012). Missing values in mass spectrometry based metabolomics: An undervalued step in data processing pipeline. Metabolomics 8 S161-S174.
[6] Jeffreys, H. (1961). Theory of Probability , 3rd ed. Clarendon Press, Oxford. · Zbl 0116.34904
[7] Jeong, J., Shi, X., Zhang, X., Kim, S. and Shen, C. (2011). An empirical Bayes model using a competition score for metabolite identification in gas chromatography mass spectrometry. BMC Bioinformatics 12 392.
[8] Kim, S., Fang, A., Wang, B., Jeong, J. and Zhang, X. (2011). An optimal peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using mixture similarity measure. Bioinformatics 27 1660-1666.
[9] Kim, S., Ouyang, M., Jeong, J., Shen, C. and Zhang, X. (2014). Supplement to “A new method of peak detection for analysis of comprehensive two-dimensional gas chromatography mass spectrometry data.” . · Zbl 1454.62531
[10] Liew, A. W.-C., Law, N.-F. and Yan, H. (2011). Missing value imputation for gene expression data: Computational techniques to recover missing data from available information. Brief. Bioinformatics 12 498-513.
[11] Morris, J. S., Coombes, K. R., Koomen, J., Baggerly, K. A. and Kobayashi, R. (2005). Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21 1764-1775.
[12] Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, E. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37-52.
[13] Nicolè, F., Guitton, Y., Courtois, E. A., Moja, S., Legendre, L. and Hossaert-Mckey, M. (2012). MSeasy: Unsupervised and untargeted GC-MS data processing. Bioinformatics 28 2278-2280.
[14] O’Callaghan, S., De Souza, D. P., Isaac, A., Wang, Q., Hodkinson, L., Olshansky, M., Erwin, T., Appelbe, B., Tull, D. L., Roessner, U., Bacic, A., McConville, M. J. and Likić, V. A. (2012). PyMS: A Python toolkit for processing of gas chromatography-mass spectrometry (GC-MS) data. Application and comparative study of selected tools. BMC Bioinformatics 30 115.
[15] Peters, S., Vivó-Truyols, G., Marriott, P. J. and Schoenmakers, P. J. (2007). Development of an algorithm for peak detection in comprehensive two-dimensional chromatography. J. Chromatogr. A 1156 14-24.
[16] Pierce, K. M., Wood, L. F., Wright, B. W. and Synovec, R. E. (2005). A comprehensive two-dimensional retention time alignment algorithm to enhance chemometric analysis of comprehensive two-dimensional separation data. Anal. Chem. 77 7735-7743.
[17] Reichenbach, S. E., Kottapalli, V., Ni, M. T. and Visvanathan, A. (2005). Computer language for identifying chemicals with comprehensive two-dimensional gas chromatography and mass spectrometry. J. Chromatogr. A 1071 263-269.
[18] Sinha, A., Fraga, G., Prazen, B. and Synovec, R. (2004). Trilinear chemometric analysis of two-dimensional comprehensive gas chromatography-time-of-flight mass spectrometry data. J. Chromatogr. A 1027 269-277.
[19] Vivó-Truyols, G. (2012). Bayesian approach for peak detection in two-dimensional chromatography. Anal. Chem. 84 2622-2630.
[20] Wang, Y., Zhou, X. O., Wang, H. H., Li, K., Yao, L. X. and Wong, S. T. C. (2008). Reversible jump MCMC approach for peak identification for stroke SELDI mass spectrometry using mixture model. Bioinformatics 24 1407-1413.
[21] Wei, X., Xue, S., Kim, S., Zhang, L., Patrick, J., Binkley, J., McClain, C. and Zhang, X. (2012). Data preprocessing method for liquid chromatography-mass spectrometry based metabolomics. Anal. Chem. 84 7963-7971.
[22] Yang, C., He, Z. and Yu, W. (2009). Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics 10 4.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.