##
**Probabilistic Fréchet means for time varying persistence diagrams.**
*(English)*
Zbl 1348.68285

This paper proposes a way to allow for a statistical approach to persistent homology, which is an algebraic-topological theory of great use in topological data analysis. In plain words, this theory considers filtrations of topological spaces depending on a parameter \(r\) and describes the births and deaths of \(n\)-dimensional holes that happen while \(r\) increases. The collection of the pairs (birth time, death time) for each \(n\)-dimensional hole is called the persistence diagram in degree \(n\) of the given filtration.

Previous research has shown the difficulty of introducing good notions of mean and variance for a set of persistence diagrams. This paper faces that problem and proposes a solution that is based on changing persistence diagrams into probability distributions, for which the concept of mean probability distribution is available.

The paper starts by recalling the definitions of persistence diagram, \(p\)th-Wasserstein distance between persistence diagrams, vineyard (i.e., a persistence diagram depending on a time parameter [D. Cohen-Steiner et al., in: Proceedings of the 22nd annual symposium on computational geometry, SCG’06. New York, NY: Association for Computing Machinery (ACM). 119–126 (2006; Zbl 1153.68388)]), and the concept of mean introduced in [Y. Mileyko et al., Inverse Probl. 27, No. 12, Article ID 124007, 22 p. (2011; Zbl 1247.68310)]. The authors highlight the fact that this last mean is not unique and does not depend continuously on the time parameter when the mean is computed for vineyards.

This paper proposes to solve that problem by defining the mean of a set of diagrams as a probability distribution over diagrams instead of a single persistence diagram. This is done by introducing the concept of grouping for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams. This concept generalizes the one of matching between two persistence diagrams to a finite family of diagrams. In plain words, it consists in selecting a family of \(N\)-tuples \(S^j=(p^j_1,\ldots, p^j_N)\) in \(X_1\times \ldots\times X_N\) in such a way that the elements of each \(X_i\) are given by the points \(\{p^j_i\}\) varying \(j\) (here multiplicities must be counted appropriately, and the points may belong to the diagonal \(x=y\)). Informally speaking, the points \(S^j\) correspond to each other in the grouping.

Each grouping \(G\) for \(X\) allows to define a unique diagram \(\text{mean}_X(G)\) obtained by taking the collection of the averages of the sets \(S^j\). A grouping is called optimal if it produces a diagram \(\text{mean}_X(G)\) that belongs to the Fréchet mean of the family \(X\), i.e., the sum of the squares of the \(2\)nd-Wasserstein distances from \(X_1,\ldots, X_N\) in the space of persistence diagrams takes its global minimum at the diagram \(\text{mean}_X(G)\).

After introducing the concept of grouping, the authors define the probabilistic Fréchet mean (PFM) for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams as a probability distribution over the set of persistence diagrams. This probability distribution \(\mu_X\) is defined by considering a suitable probability space of perturbations \(\{X'_1,\ldots, X'_N\}\) of the family \(\{X_1,\ldots, X_N\}\) so that for each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) we can compute the probability \(P(G)\) that the grouping \(G\) is optimal for \(\{X'_1,\ldots, X'_N\}\) (we observe that each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) naturally induces a grouping of \(\{X'_1,\ldots, X'_N\}\), which will be denoted by the same symbol \(G\)). The probability distribution \(\mu_X\) is a function whose support is the set of the diagrams \(\text{mean}_X(G)\) of \(X=\{X_1,\ldots, X_N\}\) varying \(G\) among the possible groupings of \(X\), defined by setting \(\mu_X(\text{mean}_X(G)):=P(G)\).

Finally, the authors prove several results, including the fact that the map that takes a set of diagrams to its PFM is Hölder continuous.

Some examples of the probabilistic Fréchet mean for a set of diagrams are also given. An appendix concerning algorithms concludes the paper.

A statement at the very beginning of the paper claims that the field of topological data analysis was first introduced in 2000. This statement is not correct. Although under a different name, the concept of persistence was already used at the beginning of the ’90s for topological analysis of data in computer vision and pattern recognition (cf., e.g., [A. Verri et al., Biol. Cybern. 70, No. 2, 99–107 (1993; Zbl 0789.92030); S. Biasotti et al., “Describing shapes by geometrical-topological properties of real functions”, ACM Computing Surveys 40, No. 4, Article ID 12 (2008; doi:10.1145/1391729.1391731)]).

Even though the idea of computing averages in persistent homology by changing persistence diagrams into continuous functions related to probability distributions is not new (cf., e.g., [P. Donatini et al., “Size functions for signature recognition”, in: Vision geometry VII. Proceedings of the SPIE 3454, 178–183 (1998; doi:10.1117/12.323253)], this paper has the merit of introducing an approach that could be of great interest for further applications of persistent homology to topological data analysis.

Previous research has shown the difficulty of introducing good notions of mean and variance for a set of persistence diagrams. This paper faces that problem and proposes a solution that is based on changing persistence diagrams into probability distributions, for which the concept of mean probability distribution is available.

The paper starts by recalling the definitions of persistence diagram, \(p\)th-Wasserstein distance between persistence diagrams, vineyard (i.e., a persistence diagram depending on a time parameter [D. Cohen-Steiner et al., in: Proceedings of the 22nd annual symposium on computational geometry, SCG’06. New York, NY: Association for Computing Machinery (ACM). 119–126 (2006; Zbl 1153.68388)]), and the concept of mean introduced in [Y. Mileyko et al., Inverse Probl. 27, No. 12, Article ID 124007, 22 p. (2011; Zbl 1247.68310)]. The authors highlight the fact that this last mean is not unique and does not depend continuously on the time parameter when the mean is computed for vineyards.

This paper proposes to solve that problem by defining the mean of a set of diagrams as a probability distribution over diagrams instead of a single persistence diagram. This is done by introducing the concept of grouping for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams. This concept generalizes the one of matching between two persistence diagrams to a finite family of diagrams. In plain words, it consists in selecting a family of \(N\)-tuples \(S^j=(p^j_1,\ldots, p^j_N)\) in \(X_1\times \ldots\times X_N\) in such a way that the elements of each \(X_i\) are given by the points \(\{p^j_i\}\) varying \(j\) (here multiplicities must be counted appropriately, and the points may belong to the diagonal \(x=y\)). Informally speaking, the points \(S^j\) correspond to each other in the grouping.

Each grouping \(G\) for \(X\) allows to define a unique diagram \(\text{mean}_X(G)\) obtained by taking the collection of the averages of the sets \(S^j\). A grouping is called optimal if it produces a diagram \(\text{mean}_X(G)\) that belongs to the Fréchet mean of the family \(X\), i.e., the sum of the squares of the \(2\)nd-Wasserstein distances from \(X_1,\ldots, X_N\) in the space of persistence diagrams takes its global minimum at the diagram \(\text{mean}_X(G)\).

After introducing the concept of grouping, the authors define the probabilistic Fréchet mean (PFM) for a finite family \(X=\{X_1,\ldots, X_N\}\) of persistence diagrams as a probability distribution over the set of persistence diagrams. This probability distribution \(\mu_X\) is defined by considering a suitable probability space of perturbations \(\{X'_1,\ldots, X'_N\}\) of the family \(\{X_1,\ldots, X_N\}\) so that for each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) we can compute the probability \(P(G)\) that the grouping \(G\) is optimal for \(\{X'_1,\ldots, X'_N\}\) (we observe that each grouping \(G\) of \(\{X_1,\ldots, X_N\}\) naturally induces a grouping of \(\{X'_1,\ldots, X'_N\}\), which will be denoted by the same symbol \(G\)). The probability distribution \(\mu_X\) is a function whose support is the set of the diagrams \(\text{mean}_X(G)\) of \(X=\{X_1,\ldots, X_N\}\) varying \(G\) among the possible groupings of \(X\), defined by setting \(\mu_X(\text{mean}_X(G)):=P(G)\).

Finally, the authors prove several results, including the fact that the map that takes a set of diagrams to its PFM is Hölder continuous.

Some examples of the probabilistic Fréchet mean for a set of diagrams are also given. An appendix concerning algorithms concludes the paper.

A statement at the very beginning of the paper claims that the field of topological data analysis was first introduced in 2000. This statement is not correct. Although under a different name, the concept of persistence was already used at the beginning of the ’90s for topological analysis of data in computer vision and pattern recognition (cf., e.g., [A. Verri et al., Biol. Cybern. 70, No. 2, 99–107 (1993; Zbl 0789.92030); S. Biasotti et al., “Describing shapes by geometrical-topological properties of real functions”, ACM Computing Surveys 40, No. 4, Article ID 12 (2008; doi:10.1145/1391729.1391731)]).

Even though the idea of computing averages in persistent homology by changing persistence diagrams into continuous functions related to probability distributions is not new (cf., e.g., [P. Donatini et al., “Size functions for signature recognition”, in: Vision geometry VII. Proceedings of the SPIE 3454, 178–183 (1998; doi:10.1117/12.323253)], this paper has the merit of introducing an approach that could be of great interest for further applications of persistent homology to topological data analysis.

Reviewer: Patrizio Frosini (Bologna)

### MSC:

68U05 | Computer graphics; computational geometry (digital and algorithmic aspects) |

55N35 | Other homology theories in algebraic topology |

60B05 | Probability measures on topological spaces |

62H11 | Directional data; spatial statistics |

### Keywords:

topological data analysis (TDA); persistent homology; Fréchet mean; vineyard; time-varying data; persistence diagram; variance
PDFBibTeX
XMLCite

\textit{E. Munch} et al., Electron. J. Stat. 9, No. 1, 1173--1204 (2015; Zbl 1348.68285)

### References:

[1] | Agarwal, P. K., Edelsbrunner, H., Harer, J. and Wang, Y. (2006). Extreme elevation on a 2-manifold., Discrete & Computational Geometry 36 553-572. · Zbl 1105.52010 · doi:10.1007/s00454-006-1265-8 |

[2] | Ban, Y.-E. A., Edelsbrunner, H. and Rudolph, J. (2004). Interface surfaces for protein-protein complexes. In, Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology . RECOMB’04 205-212. ACM, New York, NY, USA. DOI:10.1145/974614.974642. · Zbl 1315.92052 · doi:10.1145/1147954.1147957 |

[3] | Blumberg, A. J., Gal, I., Mandell, M. A. and Pancia, M. (2012). Persistent homology for metric measure spaces, and robust statistics for hypothesis testing and confidence intervals., . · Zbl 1364.55016 |

[4] | Brown, K. A. and Knudson, K. P. (2009). Nonlinear statistics of human speech data., International Journal of Bifurcation and Chaos 19 2307-2319. DOI:10.1142/S0218127409024086. · doi:10.1142/S0218127409024086 |

[5] | Bubenik, P. (July, 2012). Statistical topology using persistence landscapes., . · Zbl 1337.68221 |

[6] | Carlsson, G., Ishkhanov, T., de Silva, V. and Zomorodian, A. (2008). On the local behavior of spaces of natural images., International Journal of Computer Vision 76 1-12. DOI:10.1007/s11263-007-0056-x. |

[7] | Chazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L. J. and Oudot, S. Y. (2009). Proximity of persistence modules and their diagrams. In, Proceedings of the 25th Annual Symposium on Computational Geometry . SCG’09 237-246. ACM, New York, NY, USA. DOI:10.1145/1542362.1542407. · Zbl 1380.68387 |

[8] | Chazal, F., Glisse, M., Labruere, C. and Michel, B. (2013). Optimal rates of convergence for persistence diagrams in topological data analysis., . |

[9] | Cohen-Steiner, D., Edelsbrunner, H. and Harer, J. (2007). Stability of persistence diagrams., Discrete Comput. Geom. 37 103-120. DOI:10.1007/s00454-006-1276-5. · Zbl 1117.54027 · doi:10.1007/s00454-006-1276-5 |

[10] | Cohen-Steiner, D., Edelsbrunner, H. and Morozov, D. (2006). Vines and vineyards by updating persistence in linear time., Proceedings of the Twenty-Second Annual Symposium on Computational Geometry - SCG’06 119. DOI:10.1145/1137856.1137877. · Zbl 1153.68388 |

[11] | Dabaghian, Y., Mémoli, F., Frank, L. and Carlsson., G. (2012). A topological paradigm for hippocampal spatial map formation using persistent homology., PLoS Comput. Biol. 8 e1002581. DOI:10.1371/journal.pcbi.1002581. |

[12] | Deckard, A., Perea, J. A., Harer, J. and Haase, S. (2013). SW1PerS: Sliding Windows and 1-Persistence Scoring: Discovering Periodicity in Gene Expression Time Series Data., |

[13] | Dequéant, M.-L., Ahnert, S., Edelsbrunner, H., Fink, T. M. A., Glynn, E. F., Hattem, G., Kudlicki, A., Mileyko, Y., Morton, J., Mushegian, A. R., Pachter, L., Rowicka, M., Shiu, A., Sturmfels, B. and Pourquié, O. (2008). Comparison of pattern detection methods in microarray time series of the segmentation clock., PLoS ONE 3 e2856. DOI:10.1371/journal.pone.0002856. |

[14] | Edelsbrunner, H. and Harer, J. (2008)., Persistent Homology - A Survey . Surveys on discrete and computational geometry: Twenty years later, 257-282. American Mathematical Society, Providence, RI. · Zbl 1145.55007 · doi:10.1090/conm/453/08802 |

[15] | Edelsbrunner, H., Harer, J., Natarajan, V. and Pascucci, V. (2003). Morse-smale complexes for piecewise linear 3-manifolds. In, Proceedings of the Nineteenth Annual Symposium on Computational Geometry . SCG’03 361-370. ACM, New York, NY, USA. DOI:10.1145/777792.777846. · Zbl 1375.68125 |

[16] | Edelsbrunner, H., Letscher, D. and Zomorodian, A. (2000). Topological persistence and simplification. In, 41st Annual Symposium on Foundations of Computer Science, 2000. Proceedings. 454-463. DOI:10.1109/SFCS.2000.892133. · Zbl 1011.68152 · doi:10.1007/s00454-002-2885-2 |

[17] | Efron, B. and Tibshirani, R. J. (1993)., An Introduction to the Bootstrap . Chapman & Hall, New York. · Zbl 0835.62038 |

[18] | Galkovskyi, T., Mileyko, Y., Bucksch, A., Moore, B., Symonova, O., Price, C. A., Topp, C. N., Iyer-Pascuzzi, A. S., Zurek, P. R., Fang, S., Harer, J., Benfey, P. N. and Weitz, J. S. (2012). GiA roots: Software for high throughput analysis of plant root system architecture., BMC Plant Biology 12 . |

[19] | Gamble, J. and Heo, G. (2010). Exploring uses of persistent homology for statistical analysis of landmark-based shape data., Journal of Multivariate Analysis 101 2184-2199. DOI:10.1016/j.jmva.2010.04.016. · Zbl 1203.62116 · doi:10.1016/j.jmva.2010.04.016 |

[20] | Headd, J. J., Ban, Y. E. A., Brown, P., Edelsbrunner, H., Vaidya, M. and Rudolph, J. (2007). Protein-protein interfaces: Properties, preferences, and projections., Journal of Proteome Research 6 2576-2586. PMID: 17542628. DOI:10.1021/pr070018+. |

[21] | Khasawneh, F. A. and Munch, E. (2014). Exploring equilibria in stochastic delay differential equations using persistent homology. In, Proceedings of the ASME 2014 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, August 17-20, 2014, Buffalo, NY, USA . Paper no. DETC2014/VIB-35655. |

[22] | Khasawneh, F. A. and Munch, E. (2014). Stability determination in turning using persistent homology and time series analysis. In, Proceedings of the ASME 2014 International Mechanical Engineering Congress & Exposition, November 14-20, 2014, Montreal, Canada . Paper no. IMECE2014-40221. |

[23] | Mileyko, Y., Mukherjee, S. and Harer, J. (2011). Probability measures on the space of persistence diagrams., Inverse Problems 27 124007. · Zbl 1247.68310 · doi:10.1088/0266-5611/27/12/124007 |

[24] | Morozov, D. (2008). Homological Illusions of Persistence and Stability PhD thesis, Duke, University. |

[25] | Munkres, J. (1957). Algorithms for the assignment and transportation problems., Journal of the Society for Industrial and Applied Mathematics 5 32-38. · Zbl 0083.15302 · doi:10.1137/0105003 |

[26] | Munkres, J. R. (1993)., Elements of Algebraic Topology . Addison Wesley. · Zbl 0673.55001 |

[27] | Perea, J. A. and Harer, J. (2014). Sliding windows and persistence: An application of topological methods to signal analysis., Foundations of Computational Mathematics 1-40. DOI:10.1007/s10208-014-9206-z. · Zbl 1325.37054 · doi:10.1007/s10208-014-9206-z |

[28] | Selten, R. (1975). Reexamination of the perfectness concept for equilibrium points in extensive games., International Journal of Game Theory 4 25-55. · Zbl 0312.90072 · doi:10.1007/BF01766400 |

[29] | Turner, K. (2013). Medians and means for sets of persistence diagrams., . |

[30] | Turner, K., Mileyko, Y., Mukherjee, S. and Harer, J. (2014). Fréchet means for distributions of persistence diagrams., Discrete & Computational Geometry 52 44-70. DOI:10.1007/s00454-014-9604-7. · Zbl 1296.68182 · doi:10.1007/s00454-014-9604-7 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.