×

A comparison of classification models to identify the fragile X syndrome. (English) Zbl 1147.62388

Summary: The main models of machine learning are briefly reviewed and considered for building a classifier to identify the Fragile X Syndrome (FXS). We have analyzed 172 patients potentially affected by FXS in Andalusia (Spain) and, by means of a DNA test, each member of the data set is known to belong to one of two classes: affected or not affected. The whole predictor set, formed by 40 variables, and a reduced set with only nine predictors significantly associated with the response are considered. Four alternative base classification models have been investigated: logistic regression, classification trees, multilayer perceptron and support vector machines. For both predictor sets, the best accuracy, considering both the mean and the standard deviation of the test error rate, is achieved by the support vector machines, confirming the increasing importance of this learning algorithm. Three ensemble methods – bagging, random forests and boosting – were also considered, amongst which the bagged versions of support vector machines stand out, especially when they are constructed with the reduced set of predictor variables. The analysis of the sensitivity, the specificity and the area under the ROC curve agrees with the main conclusions extracted from the accuracy results. All of these models can be fitted by free R programs.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92C50 Medical applications (general)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

S-PLUS; C4.5; e1071; rpart; R; DAAG
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bishop, C. M. 1995. ”Neural Networks for Pattern Recognition”. New York: Oxford University Press. · Zbl 0868.68096
[2] Boser, B. E., Guyon, I. M. and Vapnik, V. N. 1992. A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory. 1992, Pittsburgh. pp.144–152. ACM Press.
[3] Breiman L., Mach. Learn. 24 pp 123– (1996)
[4] DOI: 10.1023/A:1010933404324 · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[5] Breiman, L. 2004. ”Consistency for a simple model of random forests”. Statistical Department, University of California at Berkeley. Technical Report No 670
[6] DOI: 10.1214/aos/1079120126 · Zbl 1105.62308 · doi:10.1214/aos/1079120126
[7] Breiman, L. 1984. ”Classifications and Regression Trees”. Belmont: Wadsworth and Brooks.
[8] DOI: 10.1214/aos/1031689014 · Zbl 1029.62037 · doi:10.1214/aos/1031689014
[9] DOI: 10.1214/ss/1177010638 · Zbl 0955.62589 · doi:10.1214/ss/1177010638
[10] Cristianini, N. and Shawe-Taylor, J. 2000. ”An Introduction to Support Vector Machines”. Cambridge: Cambridge University Press. · Zbl 0994.68074
[11] DOI: 10.1006/jcss.1997.1504 · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[12] DOI: 10.1093/bioinformatics/16.10.906 · doi:10.1093/bioinformatics/16.10.906
[13] Hastie, T., Tibshirani, R. and Friedman, J. 2001. ”The Elements of Statistical Learning”. New York: Springer. · Zbl 0973.62007
[14] Hertz, J., Krogh, A. and Palmer, R. 1991. ”Introduction to the Theory of Neural Computation”. Reading: Addison Wesley.
[15] Hosmer, D. W. and Lemeshow, S. 1989. ”Applied Logistic Regression”. Wiley, New York: Wiley. · Zbl 0967.62045
[16] DOI: 10.2307/1390807 · doi:10.2307/1390807
[17] Maindonald, J. and Braun, J. 2003. ”Data Analysis and Graphics Using R”. Cambridge: Cambridge University Press. · Zbl 1033.62002
[18] Morgan J., J. Amer. Statist. Soc. 58 pp 415– (1963) · doi:10.1080/01621459.1963.10500855
[19] Quinlan, J. R. 1993. ”C4.5: Programs for Machine Learning”. San Mateo: Morgan Kaufmann.
[20] R Development Core Team. 2004. R: A language and environment for statistical computing.R Foundation for Statistical Computing, Vienna, http://www.r-project.org
[21] Vapnik, V. 1998. ”Statistical Learning Theory”. New York: Wiley. · Zbl 0935.62007
[22] Venables, W. N. and Ripley, B. D. 1999. ”Modern Applied Statistics with S-PLUS”. New York: Springer. · Zbl 0927.62002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.