×

Calibrated model-based evidential clustering using bootstrapping. (English) Zbl 1459.68168

Summary: Evidential clustering is an approach to clustering in which cluster-membership uncertainty is represented by a collection of Dempster-Shafer mass functions forming an evidential partition. In this paper, we propose to construct these mass functions by bootstrapping finite mixture models. In the first step, we compute bootstrap percentile confidence intervals for all pairwise probabilities (the probabilities for any two objects to belong to the same class). We then construct an evidential partition such that the pairwise belief and plausibility degrees approximate the bounds of the confidence intervals. This evidential partition is calibrated, in the sense that the pairwise belief-plausibility intervals contain the true probabilities “most of the time”, i.e., with a probability close to the defined confidence level. This frequentist property is verified by simulation, and the practical applicability of the method is demonstrated using several real datasets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T37 Reasoning under uncertainty in the context of artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Banfield, J. D.; Raftery, A. E., Model-based gaussian and non-gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[2] Bertsekas, D. P., Nonlinear Programming, Athena Scientific, 2nd Edition (1999) · Zbl 1015.90077
[3] Bezdek, J. C.; Keller, J.; Krishnapuram, R.; Pal, N. R., Fuzzy Models and Algorithms for Pattern Recognition and Image Processing (1999), Kluwer Academic Publishers: Kluwer Academic Publishers Boston · Zbl 0998.68138
[4] Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm (1981), Plenum Press: Plenum Press New-York · Zbl 0503.68069
[5] Blum, M.; Floyd, R. W.; Pratt, V.; Rivest, R. L.; Tarjan, R. E., Time bounds for selection, J. Comput. Syst. Sci., 7, 4, 448-461 (1973) · Zbl 0278.68033
[6] Brinkman, R. R.; Gasparetto, M.; Lee, S.-J. J.; Ribickas, A. J.; Perkins, J.; Janssen, W.; Smiley, R.; Smith, C., An attempt to define the nature of chemical diabetes using a multidimensional analysis, Biol. Blood Marrow Transplant., 13, 691-700 (2007)
[7] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit., 28, 5, 781-793 (1995)
[8] Davison, A. C.; Hinkley, D. V., Bootstrap Methods and Their Application (1997), Cambridge University Press: Cambridge University Press New-York · Zbl 0886.62001
[9] Dempster, A. P., Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Stat., 38, 325-339 (1967) · Zbl 0168.17501
[10] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B, 39, 1-38 (1977) · Zbl 0364.62022
[11] Denœux, T., Constructing belief functions from sample data using multinomial confidence regions, Int. J. Approx. Reason., 42, 3, 228-252 (2006) · Zbl 1100.68112
[12] September
[13] Denœux, T.; Masson, M. H., EVCLUS: Evidential clustering of proximity data, IEEE Trans. Syst. Man Cybernet. B, 34, 1, 95-109 (2004)
[14] Denœux, T.; Sriboonchitta, S.; Kanjanatarakul, O., Evidential clustering of large dissimilarity data, Knowl. Based Syst., 106, 179-195 (2016)
[15] Denoeux, T., Decision-making with belief functions: a review, Int. J. Approx. Reason., 109, 87-110 (2019) · Zbl 1465.91038
[16] URL https://CRAN.R-project.org/package=evclust
[17] Denœux, T.; Li, S., Frequency-calibrated belief functions: review and new insights, Int. J. Approx. Reason., 92, 232-254 (2018) · Zbl 1423.68503
[18] Denoeux, T.; Li, S.; Sriboonchitta, S., Evaluating and comparing soft partitions: an approach based on dempster-Shafer theory, IEEE Trans. Fuzzy Syst., 26, 3, 1231-1244 (2018)
[19] Chapter 4
[20] Ciccio, T. J.D.; Efron, B., Bootstrap confidence intervals, Stat. Sci., 11, 3, 189-212 (1996) · Zbl 0955.62574
[21] D’Urso, P., Informational paradigm, management of uncertainty and theoretical formalisms in the clustering framework: a review, Inf. Sci., 400-401, 30-62 (2017)
[22] D’Urso, P.; Massari, R., Fuzzy clustering of mixed data, Inf. Sci., 505, 513-534 (2019) · Zbl 1456.62120
[23] Efron, B.; Tibshirani, R. J., An Introduction to the Bootstrap (1993), Chapman & Hall: Chapman & Hall New-York · Zbl 0835.62038
[24] Ferone, A.; Maratea, A., Integrating rough set principles in the graded possibilistic clustering, Inf. Sci., 477, 148-160 (2019) · Zbl 1442.68230
[25] Henze, N.; Zirkler, B., A class of invariant consistent tests for multivariate normality, Commun. Stat. - Theory Method., 19, 10, 3595-3618 (1990) · Zbl 0738.62068
[26] Jain, A. K.; Dubes, R. C., Algorithms for Clustering Data (1988), Prentice-Hall: Prentice-Hall Englewood Cliffs, NJ. · Zbl 0665.62061
[27] Krishnapuram, R.; Keller, J. M., A possibilistic approach to clustering, IEEE Trans. Fuzzy Syst., 1, 98-111 (1993)
[28] Lelandais, B.; Ruan, S.; Denœux, T.; Vera, P.; Gardin, I., Fusion of multi-tracer PET images for dose painting, Med. Image Anal., 18, 7, 1247-1259 (2014)
[29] Li, F.; Li, S.; Denœux, T., K-CEVCLUS: constrained evidential clustering of large dissimilarity data, Knowl. Based Syst., 142, 29-44 (2018)
[30] Lian, C.; Ruan, S.; Denoeux, T.; Li, H.; Vera, P., Spatial evidential clustering with adaptive distance metric for tumor segmentation in FDG-PET images, IEEE Trans. Biomed. Eng., 65, 1, 21-30 (2018)
[31] Lingras, P.; Peters, G., Applying rough set concepts to clustering, (Peters, G.; Lingras, P.; Ślezak, D.; Yao, Y., Rough Sets: Selected Methods and Applications in Management and Engineering (2012), Springer-Verlag: Springer-Verlag London, UK), 23-37
[32] Makni, N.; Betrouni, N.; Colot, O., Introducing spatial neighbourhood in evidential c-means for segmentation of multi-source images: application to prostate multi-parametric MRI, Inf. Fusion, 19, 61-72 (2014)
[33] Masson, M.-H.; Denoeux, T., An evidential version of the fuzzy c-means algorithm, Pattern Recognit., 41, 4, 1384-1397 (2008) · Zbl 1131.68081
[34] Masson, M.-H.; Denœux, T., RECM: Relational evidential c-means algorithm, Pattern Recognit. Lett., 30, 1015-1026 (2009)
[35] Lachlan, G. J.M.; Basford, K. E., Mixture models: Inference and applications to clustering (1988), Marcel Dekker: Marcel Dekker New York · Zbl 0697.62050
[36] Lachlan, G. J.M.; Krishnan, T., The EM Algorithm and Extensions (1997), Wiley: Wiley New York · Zbl 0882.62012
[37] Lachlan, G. J.M.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[38] O’Hagan, A.; Murphy, T. B.; Scrucca, L.; Gormley, I. C., Investigation of parameter uncertainty in clustering using a gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap, Comput. Stat., 34, 1779-1813 (2019) · Zbl 1505.62301
[39] Peters, G., Rough clustering utilizing the principle of indifference, Inf. Sci., 277, 358-374 (2014)
[40] Peters, G., Is there any need for rough clustering?, Pattern Recognit. Lett., 53, 31-37 (2015)
[41] Peters, G.; Crespo, F.; Lingras, P.; Weber, R., Soft clustering: fuzzy and rough approaches and their extensions and derivatives, Int. J. Approx. Reason., 54, 2, 307-322 (2013)
[42] Reaven, G. M.; Miller, R. G., An attempt to define the nature of chemical diabetes using a multidimensional analysis, Diabetologia, 16, 17-24 (1979)
[43] URL https://journal.r-project.org/archive/2016-1/scrucca-fop-murphy-etal.pdf
[44] Serir, L.; Ramasso, E.; Zerhouni, N., Evidential evolving Gustafson-Kessel algorithm for online data streams partitioning using belief function theory, Int. J. Approx. Reason., 53, 5, 747-768 (2012)
[45] Shafer, G., A mathematical theory of evidence (1976), Princeton University Press: Princeton University Press Princeton, N.J. · Zbl 0359.62002
[46] Shao, J.; Tu, D., The Jackknife and Bootstrap (1995), Springer: Springer New-York · Zbl 0947.62501
[47] Braak, C. J.F.t.; Kourmpetis, Y.; Kiers, H. A.L.; Bink, M. C.A. M., Approximating a similarity matrix by a latent class model: a reappraisal of additive fuzzy clustering, Comput. Stat. Data Anal., 53, 8, 3183-3193 (2009) · Zbl 1453.62053
[48] Vavasis, S. A., Complexity theory: quadratic programming, (Floudas, C. A.; Pardalos, P. M., Encyclopedia of Optimization (2001), Springer: Springer US, Boston, MA), 304-307
[49] Wang, K.; Ng, S.; McLachlan, G. J., Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data, 2009 Digital Image Computing: Techniques and Applications, 526-531 (2009)
[50] URL https://CRAN.R-project.org/package=EMMIXskew
[51] Yang, M.-S.; Chang-Chien, S.-J.; Nataliani, Y., Unsupervised fuzzy model-based gaussian clustering, Inf. Sci., 481, 1-23 (2019) · Zbl 1450.62078
[52] Zhou, K.; Martin, A.; Pan, Q.; Liu, Z.-G., Median evidential c-means algorithm and its application to community detection, Knowl. Based Syst., 74, 0, 69-88 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.