×

zbMATH — the first resource for mathematics

Comparing two \(K\)-category assignments by a \(K\)-category correlation coefficient. (English) Zbl 1088.92017
Summary: Predicted assignments of biological sequences are often evaluated by Matthews’ correlation coefficient [B. W. Matthews, Biochem. Biophys. Acta 405, 442–451 (1975)]. However, Matthews’ correlation coefficient applies only to cases where the assignments belong to two categories, and cases with more than two categories are often artificially forced into two categories by considering what belongs and what does not belong to one of the categories, leading to the loss of information.
Here, an extended correlation coefficient that applies to K-categories is proposed, and this measure is shown to be highly applicable for evaluating prediction of RNA secondary structure in cases where some predicted pairs go into the category “unknown” due to lack of reliability in predicted pairs or unpaired residues. Hence, predicting base pairs of RNA secondary structure can be a three-category problem. The measure is further shown to be well in agreement with existing performance measures used for ranking protein secondary structure predictions. Server and software is available at http://rk.kvl.dk/

MSC:
92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
Software:
MatrixPlot
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.A.F.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 15, 412-424, (2000)
[2] Benson, D.A.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Wheeler, D.L., Genbank, Nucleic acids res., 31, 23-27, (2003)
[3] Bernstein, F.C.; Koetzle, T.G.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rogers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M., The protein data bank: a computer based archival file for macromolecular structures, J. mol. biol., 122, 535-542, (1977), (http://www.rcsb.org/pdb/)
[4] Burge, C.; Karlin, S., Finding the genes in genomic DNA, Curr. opin. struct. biol., 8, 346-354, (1998)
[5] Cuff, J.A.; Barton, G.J., Evaluation and improvement of multiple sequence methods for protein secondary structure prediction, Proteins, 34, 508-519, (1999), (http://jura.ebi.ac.uk:8888/)
[6] Damgaard, C.K.; Andersen, E.S.; Knudsen, B.; Gorodkin, J.; Kjems, J., RNA interactions in the 5^{′} region of the HIV-1 genome, J. mol. biol., 336, 369-379, (2004)
[7] Ding, C.H.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349-358, (2001)
[8] Dowell, R.D.; Eddy, S.R., Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction, BMC bioinform., 5, 71, (2004)
[9] Escoufier, Y., Le traitement des variables vectorielle, Biometrics, 29, 751-760, (1973)
[10] Gorodkin, J.; Stærfeldt, H.H.; Lund, O.; Brunak, S., Matrixplot: visualizing sequence constraints, Bioinformatics, 15, 769-770, (1999), (http://www.cbs.dtu.dk/services/MatrixPlot/)
[11] Gorodkin, J.; Knudsen, B.; Zwieb, C., Semi-automated update and cleanup of structural RNA databases, Bioinformatics, 17, 642-645, (2001)
[12] Gorodkin, J.; Stricklin, S.L.; Stormo, G.D., Discovering common stem-loop motifs in unaligned RNA sequences, Nucleic acids res., 29, 2135-2144, (2001)
[13] Hansen, J.E.; Lund, O.; Tolstrup, N.; Gooley, A.A.; Williams, K.L.; Brunak, S., Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility, Glycoconjugate J., 15, 115-130, (1998)
[14] Jones, D., Protein secondary structure prediction based on position-specific scoring matrices, J. mol. biol., 292, 195-202, (1999), (http://insulin.brunel.ac.uk/psiform.html)
[15] Kabsch, W.; Sander, C., Dictionary of protein secondary structure: pattern recognition and hydrogen-bonded and geometrical features, Biopolymers, 22, 2577-2637, (1983)
[16] Karplus, K.; Barrett, C.; Hughey, R., Hidden Markov models for detecting remote protein homologies, Bioinformatics, 14, 846-856, (1998), (http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html)
[17] Kazi-Aoual, F.; Hitier, S.; Sabatier, R.; Lebreton, J.-D., Refined approximations to permutation tests for multivariate inference, Comput. stat. data anal., 20, 643-656, (1995) · Zbl 0875.62183
[18] Knudsen, B.; Andersen, E.S.; Damgaard, C.; Kjems, J.; Gorodkin, J., Evolutionary rate variation and RNA secondary structure prediction, Comput. biol. chem., 28, 219-226, (2004) · Zbl 1088.92019
[19] Knudsen, B.; Hein, J.J., A method to combine a set of alignments in one better alignment, Bioinformatics, 15, 122-130, (1999)
[20] Koh, I.Y.; Eyrich, V.; Marti-Renom, M.A.; Przybylski, D.; Madhusudhan, M.S.; Eswar, N.; Grana, O.; Pazos, F.; Valencia, A.; Sali, A.; Rost, B., Eva: evaluation of protein structure prediction servers, Nucleic acids res., 31, 3311-3315, (2003)
[21] Kuiken, C.L., Foley, B., Hahn, B., Korber, B., McCutchan, F., Marx, P.A., Mellors, J.W., Mullins, J.I., Sodroski, J., Wolinksy, S., 2002. Human retroviruses and aids 2000 (http://hiv-web.lanl.gov/seq-db.html).
[22] Mathews, D.; Turner, D., Dynalign: an algorithm for finding the secondary structure common to two RNA sequences, J. mol. biol., 317, 191-203, (2002)
[23] Matthews, B.W., Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochem. biophys. acta, 405, 442-451, (1975)
[24] Nielsen, H.; Engelbrecht, J.; Brunak, S.; von Heijne, G., Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Protein eng., 10, 1-6, (1997)
[25] Ouali, M.; King, R., Cascaded multiple classifiers for secondary structure prediction, Protein sci., 9, 1162-1176, (1999), (http://www.aber.ac.uk/[phiwww/prof])
[26] Pollastri, G.; Przybylski, D.; Rost, B.; Baldi, P., Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles, Proteins, 47, 228-335, (2002), (http://promoter.ics.uci.edu/BRNN-PRED/)
[27] Przybylski, D., Rost, B., 2001. Alignments grow, secondary structure prediction improves (http://cubic.bioc.columbia.edu/predictprotein).
[28] Raghava, G.P.S., 2000. Protein secondary structure prediction using nearest neighbor and neural network approach (http://www.imtech.res.in/[raghava/apssp2/]).
[29] Rost, B., Predicting one-dimensional protein structure by profile based neural networks, Meth. enzymol., 266, 525-539, (1996), (http://cubic.bioc.[columbia.edu/predictprotein])
[30] Rost, B., 2003. Profsec, unpublished (http://cubic.bioc.columbia.edu/[predictprotein]).
[31] Rost, B.; Sander, C., Prediction of protein secondary structure at better than 70% accuracy, J. mol. biol., 232, 584-599, (1993)
[32] Rost, B.; Sander, C.; Schneider, R., Redefining the goals of protein secondary structure prediction, J. mol. biol., 235, 13-26, (1994)
[33] Sonnhammer, E.L., von Heijne, G., Krogh, A., 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. In: Glasgow, J., Littlejohn, T., Major, F., Lathrop, R., Sankoff, D., Sensen, C. (Eds.), Proceedings of the Sixth International Conference on Intelligent Systems in Molecular Biology. AAAI/MIT Press, Menlo Park, California, pp. 175-182.
[34] Weisstein, E.W., 2004a. Correlation coefficient (http://mathworld.[wolfram.com/CorrelationCoefficient.html]).
[35] Weisstein, E.W., 2004b. Least squares fitting (http://mathworld.wolfram.[com/LeastSquaresFitting.html]).
[36] Xu, Y.; Xu, D., Protein threading using prospect: design and evaluation, Proteins, 40, 343-354, (2000), (http://compbio.ornl.gov/[PROSPECT/PROSPECT-Pipeline/cgi-bin/proteinpipeline])
[37] Zemla, A.; Venclovas, C.; Fidelis, K.; Rost, B., A modified definition of SOV, a segment-based measure for protein secondary structure prediction assessment, Proteins, 34, 220-223, (1999)
[38] Zuker, M., Prediction of RNA secondary structure by energy minimization, Meth. mol. biol., 25, 267-294, (1994)
[39] Zuker, M.; Jacobson, A.B., Using reliability information to annotate RNA secondary structure, RNA, 4, 669-679, (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.