Topology and data. (English) Zbl 1172.62002

This long paper is a survey arguing for the use of algebraic topology in data analysis. The paper is elementary enough to be readable by a larger audience than that one of algebraic topologists. It contains many concrete examples and a large bibliography of the subject. In modern data analysis, qualitative information is needed because one wants to understand how these data are organized on a large scale. Numerical information like metrics or coordinates is not relevant, and sometimes is merely meaningless for this kind of analysis. It turns out that topology is exactly that branch of mathematics which deals with qualitative geometric information. Quantitative values like metrics or coordinates are ignored by this kind of approach. Moreover, the functoriality of the algebraic constructions allows to relate local geometric information to global geometric information, very often encoded in an algebraic structure.
The subjects treated in this survey are persistence and homology (a way of attaching an homological signature to point clouds), an imaging technique called Mapper, the multidimensional generalization of persistence, and a chapter on clustering algorithms (a method taking as input a finite metric space and producing as output a partition of the underlying set by clusters) which starts by mentioning an interesting adaptation of the Arrow Impossibility Theorem.


62-07 Data analysis (statistics) (MSC2010)
55N35 Other homology theories in algebraic topology
62H30 Classification and discrimination; cluster analysis (statistical aspects)
55N05 Čech types
Full Text: DOI


[1] H. Abdi, Metric multidimensional scaling, in Encyclopedia of Measurement and Statistics, Sage, Thousand Oaks, CA, (2007), pp. 598-605.
[2] H. Adams and G. Carlsson, On the non-linear statistics of range image patches, preprint, (2007), available at http://comptop.stanford.edu/preprints/.
[3] Adrian Baddeley, Spatial point processes and their applications, Stochastic geometry, Lecture Notes in Math., vol. 1892, Springer, Berlin, 2007, pp. 1 – 75. · Zbl 1127.62086
[4] Shai Ben-David, Ulrike von Luxburg, and Dávid Pál, A sober look at clustering stability, Learning theory, Lecture Notes in Comput. Sci., vol. 4005, Springer, Berlin, 2006, pp. 5 – 19. · Zbl 1143.68520
[5] A. Björner, Topological methods, Handbook of combinatorics, Vol. 1, 2, Elsevier Sci. B. V., Amsterdam, 1995, pp. 1819 – 1872. · Zbl 0851.52016
[6] G. R. Bowman, X. Huang, Y. Yao, J. Sun, G. Carlsson, L. J. Guibas, and V. S. Pande, Structural insight into RNA hairpin folding intermediates, Journal of the American Chemical Society Communications, July, 2008.
[7] Erik Carlsson, Gunnar Carlsson, and Vin de Silva, An algebraic topological method for feature identification, Internat. J. Comput. Geom. Appl. 16 (2006), no. 4, 291 – 314. · Zbl 1098.65024
[8] G. Carlsson and V. de Silva, Topological estimation using witness complexes, Symposium on Point-Based Graphics, ETH, Zürich, Switzerland, June 2-4, 2004.
[9] G. Carlsson and F. Memoli, Persistent Clustering and a Theorem of J. Kleinberg , Preprint, March 2008.
[10] G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas, Persistence barcodes for shapes, International Journal of Shape Modeling, 11 (2005), pp. 149-187. · Zbl 1092.68688
[11] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian, On the local behavior of spaces of natural images, International Journal of Computer Vision, (76), 1, 2008, pp. 1-12.
[12] G. Carlsson and A. Zomorodian, The theory of multidimensional persistence, 23rd ACM Symposium on Computational Geometry, Gyeongju, South Korea, June 6-7, 2007. · Zbl 1188.55001
[13] G. Carlsson and T. Ishkanov, Local structure of spaces of natural images, preprint, (2007), available at http://comptop.stanford.edu/preprints/
[14] G. Carlsson, G. Singh, and A. Zomorodian, Computing multidimensional persistence, in preparation. · Zbl 1273.68416
[15] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer, Stability of persistence diagrams, Discrete Comput. Geom. 37 (2007), no. 1, 103 – 120. · Zbl 1117.54027
[16] D. Cohen-Steiner, H. Edelsbrunner, J. Harer and Y. Mileyko, Lipschitz functions have \( L_p\)-stable persistence, Found. Comput. Math., to appear. · Zbl 1192.55007
[17] A. Collins, A. Zomorodian, G. Carlsson, and L. Guibas, A barcode shape descriptor for curve point cloud data, Computers and Graphics, Volume 28, 2004, pp. 881-894.
[18] David Cox, John Little, and Donal O’Shea, Using algebraic geometry, Graduate Texts in Mathematics, vol. 185, Springer-Verlag, New York, 1998. · Zbl 0920.13026
[19] Edward B. Curtis, Simplicial homotopy theory, Advances in Math. 6 (1971), 107 – 209 (1971). · Zbl 0225.55002
[20] D. J. Daley and D. Vere-Jones, An introduction to the theory of point processes, Springer Series in Statistics, Springer-Verlag, New York, 1988. · Zbl 0657.60069
[21] B. Delaunay, Sur la sphere vide, Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793-800 1934. · JFM 60.0946.06
[22] Jean-Guillaume Dumas, Frank Heckenbach, David Saunders, and Volkmar Welker, Computing simplicial homology based on efficient Smith normal form algorithms, Algebra, geometry, and software systems, Springer, Berlin, 2003, pp. 177 – 206. · Zbl 1026.55010
[23] David S. Dummit and Richard M. Foote, Abstract algebra, 3rd ed., John Wiley & Sons, Inc., Hoboken, NJ, 2004. · Zbl 1037.00003
[24] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian, Topological persistence and simplification, Discrete Comput. Geom. 28 (2002), no. 4, 511 – 533. Discrete and computational geometry and graph drawing (Columbia, SC, 2001). · Zbl 1011.68152
[25] Herbert Edelsbrunner and Nimish R. Shah, Triangulating topological spaces, Internat. J. Comput. Geom. Appl. 7 (1997), no. 4, 365 – 378. Tenth Annual ACM Symposium on Computational Geometry (Stony Brook, NY, 1994). · Zbl 0887.57028
[26] B. Efron, Bootstrap methods: another look at the jackknife, Ann. Statist. 7 (1979), no. 1, 1 – 26. · Zbl 0406.62024
[27] Samuel Eilenberg, Singular homology theory, Ann. of Math. (2) 45 (1944), 407 – 447. · Zbl 0061.40603
[28] P. Frosini and C. Landi, Size theory as a topological tool for computer vision, Pattern Recognition and Image Analysis, vol. 9 (4) (1999), pp. 596-603.
[29] P. Gabriel and A. V. Roiter, Representations of finite-dimensional algebras, Springer-Verlag, Berlin, 1997. Translated from the Russian; With a chapter by B. Keller; Reprint of the 1992 English translation.
[30] Paul G. Goerss and John F. Jardine, Simplicial homotopy theory, Progress in Mathematics, vol. 174, Birkhäuser Verlag, Basel, 1999. · Zbl 0949.55001
[31] John A. Hartigan, Clustering algorithms, John Wiley & Sons, New York-London-Sydney, 1975. Wiley Series in Probability and Mathematical Statistics. · Zbl 0372.62040
[32] Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning, Springer Series in Statistics, Springer-Verlag, New York, 2001. Data mining, inference, and prediction. · Zbl 0973.62007
[33] Allen Hatcher, Algebraic topology, Cambridge University Press, Cambridge, 2002. · Zbl 1044.55001
[34] J. H. van Hateren and A. van der Schaaf, Independent component filters of natural images compared with simple cells in primary visual cortex, Proc. R. Soc. Lond., vol. B 265 (1998), 359-366.
[35] J. Headd, Y.-H. A. Ban, H. Edelsbrunner, M. Vaidya and J. Rudolph. Protein-protein interfaces: Properties, preferences, and projections, Protein Research, to appear, 2007.
[36] D. Hubel, Eye, Brain, and Vision, Scientific American Library, W. H. Freeman, New York, 1995, viii+242pp. ISBN: 0-716-76009-6.
[37] Peter J. Huber, Projection pursuit, Ann. Statist. 13 (1985), no. 2, 435 – 525. With discussion. · Zbl 0595.62059
[38] T. Kenet, D. Bibitchkov, M. Tsodyks, A. Grinvald, and A. Arieli, Spontaneously emerging cortical representations of visual attributes, Nature 425 (2003), 954-956.
[39] J.M. Kleinberg, An impossibility theorem for clustering, NIPS 2002: 446-453.
[40] S. Lafon and A.B. Lee, Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parametrization, IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 9 (2006), pp. 1393-1403.
[41] A.B. Lee, K.S. Pedersen, and D. Mumford, The nonlinear statistics of high-contrast patches in natural images, International Journal of Computer Vision (54), No. 1-3, August 2003, pp. 83-103. · Zbl 1070.68661
[42] Regina Y. Liu, Jesse M. Parelius, and Kesar Singh, Multivariate analysis by data depth: descriptive statistics, graphics and inference, Ann. Statist. 27 (1999), no. 3, 783 – 858. With discussion and a rejoinder by Liu and Singh. · Zbl 0984.62037
[43] Ulrike von Luxburg, Mikhail Belkin, and Olivier Bousquet, Consistency of spectral clustering, Ann. Statist. 36 (2008), no. 2, 555 – 586. · Zbl 1133.62045
[44] Saunders Mac Lane, Categories for the working mathematician, 2nd ed., Graduate Texts in Mathematics, vol. 5, Springer-Verlag, New York, 1998. · Zbl 0906.18001
[45] J. Peter May, Simplicial objects in algebraic topology, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1992. Reprint of the 1967 original. · Zbl 0769.55001
[46] Peter McCullagh, What is a statistical model?, Ann. Statist. 30 (2002), no. 5, 1225 – 1310. With comments and a rejoinder by the author. · Zbl 1039.62003
[47] Ezra Miller and Bernd Sturmfels, Combinatorial commutative algebra, Graduate Texts in Mathematics, vol. 227, Springer-Verlag, New York, 2005. · Zbl 1090.13001
[48] Peter J. Huber, Projection pursuit, Ann. Statist. 13 (1985), no. 2, 435 – 525. With discussion. · Zbl 0595.62059
[49] J. Milnor, Morse theory, Based on lecture notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51, Princeton University Press, Princeton, N.J., 1963. · Zbl 0108.10401
[50] David Mumford, The dawning of the age of stochasticity, Mathematics: frontiers and perspectives, Amer. Math. Soc., Providence, RI, 2000, pp. 197 – 218. · Zbl 0962.60003
[51] James R. Munkres, Topology: a first course, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1975. · Zbl 0306.54001
[52] Partha Niyogi, Stephen Smale, and Shmuel Weinberger, Finding the homology of submanifolds with high confidence from random samples, Discrete Comput. Geom. 39 (2008), no. 1-3, 419 – 441. · Zbl 1148.68048
[53] G. Palla, I. Derènyi, I. Farkas, and T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature, Volume 435, 9 June 2005, pp. 814-818.
[54] Mathew Penrose, Random geometric graphs, Oxford Studies in Probability, vol. 5, Oxford University Press, Oxford, 2003. · Zbl 1029.60007
[55] Georges Reeb, Sur les points singuliers d’une forme de Pfaff complètement intégrable ou d’une fonction numérique, C. R. Acad. Sci. Paris 222 (1946), 847 – 849 (French). · Zbl 0063.06453
[56] S.T. Roweis and L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) (December), pp. 2323-2326.
[57] Vin de Silva and Robert Ghrist, Coverage in sensor networks via persistent homology, Algebr. Geom. Topol. 7 (2007), 339 – 358. · Zbl 1134.55003
[58] B. W. Silverman, Density estimation for statistics and data analysis, Monographs on Statistics and Applied Probability, Chapman & Hall, London, 1986. · Zbl 0617.62042
[59] G. Singh, F. Memoli, T. Ishkhanov, G. Carlsson, G. Sapiro and D. Ringach, Topological Structure of Population Activity in Primary Visual Cortex, Journal of Vision, Volume 8, Number 8, Article 11, pp. 1-18, 2008.
[60] G. Singh, F. Memoli and G. Carlsson, Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Point Based Graphics 2007, Prague, September 2007.
[61] J.B. Tenenbaum, V. de Silva and J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) (December), pp. 2319-2323.
[62] M. Tsodyks, T. Kenet, A. Grinvald, and A. Arieli, Linking spontaneous activity of single cortical neurons and the underlying functional architecture, Science 286, (1999), pp. 1943-1996.
[63] B. Wandell, Foundations of Vision, Sinauer Associates, Sunderland, Mass., 1995, xvi+476pp., ISBN:0-878-93853-2.
[64] Afra Zomorodian and Gunnar Carlsson, Computing persistent homology, Discrete Comput. Geom. 33 (2005), no. 2, 249 – 274. · Zbl 1069.55003
[65] Afra Zomorodian and Gunnar Carlsson, Localized homology, Comput. Geom. 41 (2008), no. 3, 126 – 148. · Zbl 1155.65021
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.