Prediction of S-sulfenylation sites using mRMR feature selection and fuzzy support vector machine algorithm.(English)Zbl 1406.92190

Summary: Cysteine S-sulfenylation is an important protein post-translational modification, which plays a crucial role in transcriptional regulation, cell signaling, and protein functions. To better elucidate the molecular mechanism of S-sulfenylation, it is important to identify S-sulfenylated substrates and their corresponding S-sulfenylation sites accurately. In this study, a novel bioinformatics tool named Sulf$$_-$$FSVM is proposed to predict S-sulfenylation sites by using multiple feature extraction and fuzzy support vector machine algorithm. On the one hand, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs features are incorporated to encode S-sulfenylation sites. And the maximum relevance minimum redundancy method are adopted to remove the redundant features. On the other hand, a fuzzy support vector machine algorithm is used to handle the class imbalance and noise problem in S-sulfenylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of Sulf$$_-$$FSVM achieves a satisfactory performance with a sensitivity of 73.26%, a specificity of 70.78%, an accuracy of 71.07% and a Matthew’s correlation coefficient of 0.2971. Independent tests also show that Sulf$$_-$$FSVM significantly outperforms existing S-sulfenylation sites predictors. Therefore, Sulf$$_-$$FSVM can be a useful tool for accurate prediction of protein S-sulfenylation sites.

MSC:

 92C40 Biochemistry, molecular biology 68T05 Learning and adaptive systems in artificial intelligence
Full Text: