zbMATH — the first resource for mathematics

Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images. (English) Zbl 1452.62793
Summary: In this paper, multivalued data or multiple values variables are defined. They are typical when there is some intrinsic uncertainty in data production, as the result of imprecise measuring instruments, such as in image recognition, in human judgments and so on. So far, contributions in symbolic data analysis literature provide data preprocessing criteria allowing for the use of standard methods such as factorial analysis, clustering, discriminant analysis, tree-based methods. As an alternative, this paper introduces a methodology for supervised classification, the so-called Dynamic CLASSification TREE (D-CLASS TREE), dealing simultaneously with both standard and multivalued data as well. For that, an innovative partitioning criterion with a tree-growing algorithm will be defined. Main result is a dynamic tree structure characterized by the simultaneous presence of binary and ternary partitions. A real world case study will be considered to show the advantages of the proposed methodology and main issues of the interpretation of the final results. A comparative study with other approaches dealing with the same types of data will be also shown. The comparison highlights that, even if the results are quite similar in terms of error rates, the proposed D-CLASS tree returns a more interpretable tree-based structure.
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H35 Image analysis in multivariate analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI
[1] Argenziano, G.; Fabbrocini, G.; Carli, P.; De Giorgi, V.; Sammarco, E.; Delfino, M., Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions: comparison of the abcd rule of dermatoscopy and a new 7-point checklist based on pattern analysis, Archiv Dermatol, 134, 12, 1563-1570 (1998)
[2] Bergmann B, Hommel G (1988) Improvements of general multiple test procedures for redundant systems of hypogheses. In: Bauer P, Hommel G, Sonnemann E (eds) Multiple hypothesenprüfung (Multiple hypotheses testing). Springer, Berlin, pp 100-115
[3] Bashir, S.; Qamar, U.; Khan, FH, Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble, Qual Quant, 49, 2061-2076 (2014)
[4] Billard, L.; Diday, E., From the statistics of data to the statistics of knowledge: symbolic data analysis, J Am Stat Assoc, 98, 462, 470-487 (2003)
[5] Bock, HH; Diday, E., Analysis of symbolic data: exploratory methods for extracting statistical information from complex data (2012), Berlin: Springer Science & Business Media, Berlin
[6] Bono, A.; Tomatis, S.; Bartoli, C.; Tragni, G.; Radaelli, G.; Maurichi, A.; Marchesini, R., The abcd system of melanoma detection, Cancer, 85, 1, 72-77 (1999)
[7] Borgoni, R.; Berrington, A., Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures, Qual Quant, 47, 4, 1991-2008 (2013)
[8] Box GE, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26(2):211-252 · Zbl 0156.40104
[9] Bradley, AP, The use of the area under the roc curve in the evaluation of machine learning algorithms, Pattern Recognit., 30, 7, 1145-1159 (1997)
[10] Breiman, L., Bagging predictors, Mach Learn, 24, 2, 123-140 (1996) · Zbl 0858.68080
[11] Breiman, L., Random forests, Mach Learn, 45, 1, 5-32 (2001) · Zbl 1007.68152
[12] Breiman, L.; Friedman, J.; Olshen, RA; Stone, CJ, Classification and regression trees (1984), Boca Raton: CRC Press, Boca Raton
[13] Brier, GW, Verification of forecasts expressed in terms of probability, Mon Weather Rev, 78, 1, 1-3 (1950)
[14] Cappelli, C.; Mola, F.; Siciliano, R., A statistical approach to growing a reliable honest tree, Comput Stat Data Anal, 38, 3, 285-299 (2002) · Zbl 1079.62516
[15] Celebi, ME; Kingravi, HA; Uddin, B.; Iyatomi, H.; Aslandogan, YA; Stoecker, WV; Moss, RH, A methodological approach to the classification of dermoscopy images, Comput Med Imag Graph, 31, 6, 362-373 (2007)
[16] Couso, I.; Sánchez, L., Mark-recapture techniques in statistical tests for imprecise data, Int J Approx Reason, 52, 2, 240-260 (2011) · Zbl 1214.62007
[17] Cozza, V.; Guarracino, MR; Maddalena, L.; Baroni, A., Dynamic clustering detection through multi-valued descriptors of dermoscopic images, Stat Med, 30, 20, 2536-2550 (2011)
[18] D’Ambrosio, A.; Aria, M.; Siciliano, R., Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm, J Classif, 29, 2, 227-258 (2012) · Zbl 1360.62324
[19] D’Ambrosio, A.; Aria, M.; Iorio, C.; Siciliano, R., Regression trees for multivalued numerical response variables, Expert Syst Appl, 69, 21-28 (2017)
[20] Dietterich, TG; Kittler, J.; Roli, F., Ensemble methods in machine learning, Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, 1-15 (2000), Berlin: Springer, Berlin
[21] Ferraro, MB; Coppi, R.; Rodríguez, GG; Colubi, A., A linear regression model for imprecise response, Int J Approx Reason, 51, 7, 759-770 (2010) · Zbl 1201.62086
[22] Ferraro, MB; Colubi, A.; González-Rodríguez, G.; Coppi, R., A determination coefficient for a linear regression model with imprecise response, Environmetrics, 22, 4, 516-529 (2011)
[23] Ferri, C.; Hernández-Orallo, J.; Modroiu, R., An experimental comparison of performance measures for classification, Pattern Recognit Lett, 30, 1, 27-38 (2009)
[24] Freund, Y.; Schapire, RE, A decision-theoretic generalization of on-line learning and an application to boosting, J Comput Syst Sci, 55, 1, 119-139 (1997) · Zbl 0880.68103
[25] Friedman, M., The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J Am Stat Assoc, 32, 200, 675-701 (1937) · JFM 63.1098.02
[26] Garcia, S.; Herrera, F., An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons, J Mach Learn Res, 9, Dec, 2677-2694 (2008) · Zbl 1225.68178
[27] Gil, MÁ; Montenegro, M.; González-Rodríguez, G.; Colubi, A.; Casals, MR, Bootstrap approach to the multi-sample test of means with imprecise data, Comput Stat Data Anal, 51, 1, 148-162 (2006) · Zbl 1157.62391
[28] Górecki, T.; Krzyśko, M.; Waszak, L.; Wołyński, W., Selected statistical methods of data analysis for multivariate functional data, Stat Pap, 59, 1, 1-30 (2016)
[29] Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J., The elements of statistical learning: data mining, inference and prediction, Math Intell, 27, 2, 83-85 (2005)
[30] Iman, RL; Davenport, JM, Approximations of the critical region of the fbietkan statistic, Commun Stat Theory Methods, 9, 6, 571-595 (1980) · Zbl 0451.62061
[31] Iorio, C.; Frasso, G.; DAmbrosio, A.; Siciliano, R., Parsimonious time series clustering using p-splines, Expert Syst Appl, 52, 26-38 (2016)
[32] Kruskal, WH; Wallis, WA, Use of ranks in one-criterion variance analysis, J Am Stat Assoc, 47, 260, 583-621 (1952) · Zbl 0048.11703
[33] Lange, T.; Mosler, K.; Mozharovskyi, P., Fast nonparametric classification based on data depth, Stat Pap, 55, 49-69 (2014) · Zbl 1283.62128
[34] Limam M, Diday E, Winsberg S (2003) Symbolic class description with interval data. J Symb Data Anal 1(1)
[35] Maglogiannis, I.; Kosmopoulos, DI, Computational vision systems for the detection of malignant melanoma, Oncol Rep, 15, 4, 1027-1032 (2006)
[36] Makinde OS (2016) Classification rules based on distribution functions of functional depth. Stat Pap. 10.1007/s00362-016-0841-0
[37] Mballo, C.; Diday, E., Decision trees on interval valued variables, Electron J Symb Data Anal, 3, 1, 8-18 (2005)
[38] Mosler K, Mozharovskyi P (2015) Fast dd-classification of functional data. Stat Pap. 10.1007/s00362-015-0738-3 · Zbl 1416.62352
[39] Nachbar, F.; Stolz, W.; Merkle, T.; Cognetta, AB; Vogt, T.; Landthaler, M.; Bilek, P.; Braun-Falco, O.; Plewig, G., The abcd rule of dermatoscopy: high prospective value in the diagnosis of doubtful melanocytic skin lesions, J Am Acad Dermatol, 30, 4, 551-559 (1994)
[40] Otsu, N., A threshold selection method from gray-level histograms, Automatica, 11, 285-296, 23-27 (1975)
[41] Périnel, E.; Lechevallier, Y.; Bock, HH; Diday, E., Symbolic discrimination rules, Analysis of symbolic data: exploratory methods for extracting statistical information from complex data, 244-265 (2000), Berlin: Springer, Berlin · Zbl 0976.62061
[42] Siciliano R, Aria M, Conversano C (2004) Harvesting trees: methods, software and applications. In: Proceedings in Computational Statistics: 16th Symposium of IASC. COMPSTAT2004, held Prague
[43] Siciliano R, Tutore VA, Aria M, D’Ambrosio A (2010) Trees with leaves and without leaves. In: Proceedings of the 45th Scientific Meeting of the Italian Statistical Society. Italian Statistical Society
[44] Situ, N.; Yuan, X.; Zouridakis, G., Assisting main task learning by heterogeneous auxiliary tasks with applications to skin cancer screening, J Mach Learn Res, 15, 688 (2011)
[45] Tarpey, T.; Kinateder, KK, Clustering functional data, J Classif, 20, 1, 093-114 (2003)
[46] Tutore, VA; Siciliano, R.; Aria, M.; Berthold, M.; Shawe-Taylor, J.; Lavrač, N., Conditional classification trees using instrumental variables, Advances in intelligent data analysis VII. IDA 2007. Lecture Notes in Computer Science, 163-173 (2007), Berlin: Springer, Berlin
[47] Viertl R (2003) Statistical inference with imprecise data. Encyclopedia of life support systems. UNESCO, Paris. Online publication: http://www.eolss.unesco.org · Zbl 1053.62004
[48] Viertl, R., On statistical inference for non-precise data, Environmetrics, 8, 5, 541-568 (1997)
[49] Yang, MS; Hwang, PY; Chen, DH, Fuzzy clustering algorithms for mixed feature variables, Fuzzy Sets Syst, 141, 2, 301-317 (2004) · Zbl 1137.62350
[50] Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the ICML. Citeseer, vol 1, pp 609-616
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.