Protein subcellular localization in human and hamster cell lines: employing local ternary patterns of fluorescence microscopy images. (English) Zbl 1411.92113

Summary: Discriminative feature extraction technique is always required for the development of accurate and efficient prediction systems for protein subcellular localization so that effective drugs can be developed. In this work, we showed that local ternary patterns (LTPs) effectively exploit small variations in pixel intensities; present in fluorescence microscopy based protein images of human and hamster cell lines. Further, synthetic minority oversampling technique is applied to balance the feature space for the classification stage. We observed that LTPs coupled with data balancing technique could enable a classifier, in this case support vector machine, to yield good performance. The proposed ensemble based prediction system, using 10-fold cross-validation, has yielded better performance compared to existing techniques in predicting various subcellular compartments for both 2D HeLa and CHO datasets. The proposed predictor is available online at:, which is freely accessible to the public.


92C40 Biochemistry, molecular biology
92-08 Computational methods for problems pertaining to biology
Full Text: DOI


[1] Boland, M. V.; Murphy, R. F., A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of hela cells, Bioinformatics, 17, 1213-1223, (2001)
[2] Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357, (2002) · Zbl 0994.68128
[3] Chebira, A.; Barbotin, Y.; Jackson, C.; Merryman, T.; Srinivasa, G.; Murphy, R. F.; Kovacevic, J., A multiresolution approach to automated classification of protein subcellular location images, BMC Bioinformatics, 8, 210, (2007)
[4] Chen, W.; Feng, P. M.; Lin, H.; Chou, K.-C., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Research, 41, e68, (2013)
[5] Chen, X.; Velliste, M.; Murphy, R. F., Automated interpretation of subcellular patterns in fluorescence microscope images for location proteomics, Cytometry Part A, Journal of the International Society for Advancement of Cytometry, 69A, 631-640, (2006)
[6] Chen, Y.-K.; Li, K.-B., Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of chou′s pseudo amino acid composition, Journal of Theoretical Biology, 318, 1-12, (2013) · Zbl 1406.92450
[7] Chou, K.-C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. PROTEINS: Structure, Function, and Genetics (Erratum: ibid, 2001, Vol 44, 60) 43, 246-255.
[8] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), Journal of Theoretical Biology, 273, 236-247, (2011) · Zbl 1405.92212
[9] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Molecular Biosystems, 9, 1092-1100, (2013)
[10] Chou, K.-C.; Wu, Z.-C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS One, 6, e18258, (2011)
[11] Chou, K.-C.; Wu, Z.-C.; Xiao, X., Iloc-hum: using accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Molecular Biosystems, 8, 629-641, (2012)
[12] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of chou′s pseudo amino acid composition for risk type prediction of human papillomaviruses, Journal of Theoretical Biology, 263, 203-209, (2010) · Zbl 1406.92455
[13] Georgiou, D. N.; Karakasidis, T. E.; Nieto, J. J.; Torres, A., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to chou′s pseudo amino acid composition, Journal of Theoretical Biology, 257, 17-26, (2009) · Zbl 1400.92393
[14] Gunn, S. R., Support vector machines for classification and regression, faculty of engineering, science and mathematics, school of electronics and computer science, (1998), University of Southampton Southampton
[15] Hamilton, N. A.; Pantelic, R. S.; Hanson, K.; Teasdale, R. D., Fast automated cell phenotype image classification, BMC Bioinformatics, 8, 110, (2007)
[16] Hand, D. J.; Till, R. J., A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine Learning, 45, 171-186, (2001) · Zbl 1007.68180
[17] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, Journal of Theoretical Biology, 271, 10-17, (2011) · Zbl 1405.92217
[18] Hayat, M.; Khan, A., Memhyb: predicting membrane protein types by hybridizing SAAC and PSSM, Journal of Theoretical Biology, 292, 93-102, (2012) · Zbl 1307.92308
[19] Hayat, M.; Khan, A.; Yeasin, M., Prediction of membrane proteins using split amino acid composition and ensemble classification, Amino Acids, 42, 2447-2460, (2011)
[20] He, Z.-S.; Shi, X.-H.; Kong, X.-Y.; Zhu, Y.-B.; Chou, K.-C., A novel sequence-based method for phosphorylation site prediction with feature selection and analysis, Protein & Peptide Letters, 19, 70-78, (2012)
[21] He, Z.; Zhang, J.; Shi, X.-H.; Hu, L.-L.; Kong, X.; Cai, Y.-D.; Chou, K.-C., Predicting drug-target interaction networks based on functional groups and biological features, PLoS One, 5, e9603, (2010)
[22] Hu, Y.; Murphy, R. F., Automated interpretation of subcellular patterns from immunofluorescence microscopy, Journal of Immunological Methods, 290, 93-105, (2004)
[23] Huang, T.; Chen, L.; Cai, Y.-D.; Chou, K.-C., Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property, PLoS One, 6, e25297, (2011)
[24] Huang, T.; He, Z.-S.; Cui, W. R.; Cai, Y.-D.; Shi, X. H.; Hu, L.-L.; Chou, K.-C., A sequence-based approach for predicting protein disordered regions, Protein & Peptide Letters, 20, 243-248, (2013)
[25] Huang, T.; Shi, X.-H.; Wang, P.; He, Z.; Feng, K.-Y.; Hu, L.; Kong, X.; Li, Y.-X.; Cai, Y.-D.; Chou, K.-C., Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks, PLoS One, 5, e10972, (2010)
[26] Khan, A.; Khan, M. F.; Choi, T.-S., Proximity based GPCRs prediction in transform domain, Biochemical and Biophysical Research Communications, 371, 411-415, (2008)
[27] Khan, A.; Majid, A.; Hayat, M., CE-ploc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition, Computational Biology and Chemistry, 35, 218-229, (2011) · Zbl 1226.92020
[28] Khosravian, M.; FK, F. K.F.; Beigi, M. M.; M, M. B.; Mohabatkar, H., Predicting antibacterial peptides by the concept of chou′s pseudo-amino acid composition and machine learning methods, Protein & Peptide Letters, 20, 180-186, (2013)
[29] Li, B.-Q.; Hu, L.-L.; Niu, S.; Cai, Y.-D.; Chou, K.-C., Predict and analyze S-nitrosylation modification sites with the mrmr and IFS approaches, Journal of Proteomics, 75, 1654-1665, (2012)
[30] Li, B.-Q.; Huang, T.; Liu, L.; Cai, Y.-D.; Chou, K.-C., Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network, PLoS One, 7, e33393, (2012)
[31] Li, B.-Q.; Hu, L.-L.; Chen, L.; Feng, K.-Y.; Cai, Y.-D.; Chou, K.-C., Prediction of protein domain with mrmr feature selection and analysis, PLoS One, 7, e39308, (2012)
[32] Li, S.; Kwok, J. T.; Zhu, H.; Wang, Y., Texture classification using the support vector machines, Pattern Recognition, 36, 2883-2893, (2003) · Zbl 1059.68110
[33] Lin, C.-C.; Tsai, Y.-S.; Lin, Y.-S.; Chiu, T.-Y.; Hsiung, C.-C.; Lee, M.-I.; Simpson, J. C.; Hsu, C.-N., Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization, Bioinformatics, 23, 3374-3381, (2007)
[34] Lin, W. Z.; Fang, J. A.; Xiao, X.; Chou, K.-C., Iloc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins, Molecular Biosystems, 9, 634-644, (2013)
[35] Mei, S., Multi-kernel transfer learning based on chou′s pseaac formulation for protein submitochondria localization, Journal of Theoretical Biology, 293, 121-130, (2012) · Zbl 1307.92085
[36] Meynet, J.; Thiran, J.-P., Information theoretic combination of pattern classifiers, Pattern Recognition, 43, 3412-3421, (2010) · Zbl 1344.68197
[37] Mohabatkar, H., Prediction of cyclin proteins using chou′s pseudo amino acid composition, Protein & Peptide Letters, 17, 1207-1214, (2010)
[38] Mohabatkar, H.; Beigi, M. M.; Esmaeili, A., Prediction of GABA(A) receptor proteins using the concept of chou′s pseudo-amino acid composition and support vector machine, Journal of Theoretical Biology, 281, 18-23, (2011) · Zbl 1397.92215
[39] Murphy, R. F., Automated interpreation of subcellular location patterns, IEEE International Symposium on Biomedical Imaging: Nano to Macro, 1, 53-56, (2004)
[40] Murphy, R.F., Boland, M.V., Velliste, M., Towards a Systematics for Protein Subcellular Location: Quantitative Description of Protein Localization Patterns and Automated Analysis of Fluorescence Microscope Images, In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, La Jolla/ San Diego, CA, USA 2000, pp. 251-259.
[41] Murphy, R. F.; Velliste, M.; Porreca, G., Robust numerical features for description and classification of subcellular location patterns in fluorescence microscope images, Journal of VLSI Signal Processing, 35, 311-321, (2003) · Zbl 1042.68663
[42] Nanni, L.; Lumini, A., A reliable method for cell phenotype image classification, Artificial Intelligence in Medicine, 43, 87-97, (2008)
[43] Nanni, L.; Brahnam, S.; Lumini, A., Novel features for automated cell phenotype image classification, Advances in Computational Biology: Advances in Experimental Medicine and Biology (AEMB), 680, 207-213, (2010)
[44] Nanni, L., Brahnam, S., Lumini, A., Selecting the best performing rotation invariant patterns in local binary/ternary patterns. In: Proceedings of the International Conference on Image Processing, Computer Vision, & Pattern Recognition (IPCV’10), Las Vegas, Nevada, USA 2010b, pp. 369-375.
[45] Nanni, L.; Lumini, A.; Lin, Y.-S.; Hsu, C.-N.; Lin, C.-C., Fusion of systems for automated cell phenotype image classification, Expert Systems with Applications, 37, 1556-1562, (2010)
[46] Sahu, S. S.; Panda, G., A novel feature representation method based on chou′s pseudo amino acid composition for protein structural class prediction, Computational Biology and Chemistry, 34, 320-327, (2010) · Zbl 1403.92221
[47] Shi, J.-Y.; Zhang, S.-W.; Pan, Q.; Cheng, Y.-M.; Xie, J., Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition, Amino Acids, 33, 69-74, (2007)
[48] Srinivasa, G.; Merryman, T.; Chebira, A.; Kovacevic, J.; Mintos, A., Adaptive multiresolution techniques for subcellular protein location classification, IEEE International Conference on Acoustics, Speech and Signal Processing, 5, 14-19, (2006)
[49] Tahir, M.; Khan, A.; Majid, A., Protein subcellular localization of fluorescence imagery using spatial and transform domain features, Bioinformatics, 28, 91-97, (2012)
[50] Tscherepanow, M.; Jensen, N.; Kummert, F., An incremental approach to automated protein localisation, BMC Bioinformatics, 9, 445, (2008)
[51] Wu, Z. C.; Xiao, X.; Chou, K.-C., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Molecular Biosystems, 7, 3287-3297, (2011)
[52] Wu, Z. C.; Xiao, X.; Chou, K.-C., Iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins, Protein & Peptide Letters, 19, 4-14, (2012)
[53] Xiao, X.; Wu, Z.-C.; Chou, K.-C., A multi-label classifier for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple sites, PLoS One, 6, e20592, (2011)
[54] Xiao, X.; Wu, Z. C.; Chou, K.-C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, Journal of Theoretical Biology, 284, 42-51, (2011) · Zbl 1397.92238
[55] Xu, Y.; Ding, J.; Wu, L. Y.; Chou, K.-C., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, e55844, (2013)
[56] Yi, X.-F.; He, Z.-S.; Chou, K.-C.; Kong, X.-Y., Nucleosome positioning based on the sequence word composition, Protein & Peptide Letters, 19, 79-90, (2012)
[57] Zhang, L.; Liao, B.; Li, D.; Zhu, W., A novel representation for apoptosis protein subcellular localization prediction using support vector machine, Journal of Theoretical Biology, 259, 361-365, (2009) · Zbl 1402.92163
[58] Zhang, S.-W.; Zhang, Y.-L.; Yang, H.-F.; Zhao, C.-H.; Pan, Q., Using the concept of chou′s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies, Amino Acids, 34, 565-572, (2008)
[59] Zhang, T.-L.; Ding, Y.-S., Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes, Amino Acids, 33, 623-629, (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.