Probabilistic symmetries and invariant neural networks.

*(English)*Zbl 07255121Summary: Treating neural network inputs and outputs as random variables, we characterize the structure of neural networks that can be used to model data that are invariant or equivariant under the action of a compact group. Much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures, in an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings. By considering group invariance from the perspective of probabilistic symmetry, we establish a link between functional and probabilistic symmetry, and obtain generative functional representations of probability distributions that are invariant or equivariant under the action of a compact group. Our representations completely characterize the structure of neural networks that can be used to model such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We demonstrate that examples from the recent literature are special cases, and develop the details of the general program for exchangeable sequences and arrays.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

probabilistic symmetry; convolutional neural networks; exchangeability; neural architectures; invariance; equivariance; sufficiency; adequacy; graph neural networks
PDF
BibTeX
XML
Cite

\textit{B. Bloem-Reddy} and \textit{Y. W. Teh}, J. Mach. Learn. Res. 21, Paper No. 90, 61 p. (2020; Zbl 07255121)

Full Text:
Link

##### References:

[1] | M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URLhttps://www.tensorflow.org/. Software available from tensorflow.org. |

[2] | D. J. Aldous. Exchangeability and related topics. In P. L. Hennequin, editor,Ecole d’ ´´Et´e de Probabilit´es de Saint-Flour XIII - 1983, number 1117 in Lecture Notes in Mathematics, pages 1-198. Springer, 1985. |

[3] | S. Andersson. Distributions of maximal invariants using quotient measures.The Annals of Statistics, 10(3):955-961, Sep 1982. |

[4] | S. Arora, R. Ge, B. Neyshabur, and Y. Zhang. Stronger generalization bounds for deep nets via a compression approach. InInternational Conference on Machine Learning (ICML), pages 254-263, 2018. |

[5] | T. Austin. Exchangeable random measures.Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, 51(3):842-861, 08 2015. |

[6] | S. Bacallado, S. Favaro, and L. Trippa.Bayesian nonparametric analysis of reversible Markov chains.The Annals of Statistics, 41(2):870-896, 2013. |

[7] | R. R. Bahadur. Sufficiency and statistical decision functions.The Annals of Mathematical Statistics, 25(3):423-462, 1954. |

[8] | I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le. Attention augmented convolutional networks. InThe IEEE International Conference on Computer Vision (ICCV), 2019. |

[9] | A. Bietti and J. Mairal. Group invariance, stability to deformations, and complexity of deep convolutional representations.Journal of Machine Learning Research, 20(25):1-49, 2019. |

[10] | D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn schemes.The Annals of Statistics, 1(2):353—355, Mar 1973. |

[11] | B. Bloem-Reddy and Y. W. Teh.Neural network models of exchangeable sequences. NeurIPS Workshop on Bayesian Deep Learning, 2018. |

[12] | C. Borgs and J. Chayes. Graphons: A nonparametric method to model, estimate, and design algorithms for massive networks.Proceedings of the 2017 ACM Conference on Economics and Computation - EC ’17, 2017. |

[13] | J. Bruna and S. Mallat. Invariant scattering convolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1872-1876, 2013. |

[14] | J. Bruna and S. Mallat. Multiscale sparse microcanonical models.Mathematical Statistics and Learning, 1(3/4):257-315, 2018. |

[15] | J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. InInternational Conference on Learning Representations (ICLR), 2014. |

[16] | D. Cai, T. Campbell, and T. Broderick. Edge-exchangeable graphs and sparsity. InAdvances in Neural Information Processing Systems (Neurips), pages 4249-4257. 2016. |

[17] | F. Caron and E. B. Fox. Sparse graphs using exchangeable random measures.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1-44, 2017. |

[18] | E. C¸ inlar.Probability and Stochastics. Springer New York, 2011. |

[19] | J. Chan, V. Perrone, J. Spence, P. Jenkins, S. Mathieson, and Y. Song. A likelihood-free inference framework for population genetic data using exchangeable neural networks. In Advances in Neural Information Processing Systems (Neurips), pages 8594-8605. 2018. |

[20] | S. Chen, E. Dobriban, and J. H. Lee. Invariance reduces variance: Understanding data augmentation in deep learning and beyond.arXiv e-prints, abs/1907.10905, 2019. |

[21] | T. S. Cohen and M. Welling. Group equivariant convolutional networks. InInternational Conference on Machine Learning (ICML), pages 2990-2999, 2016. |

[22] | T. S. Cohen and M. Welling. Steerable CNNs. InInternational Conference on Learning Representations (ICLR), 2017. |

[23] | T. S. Cohen, M. Geiger, J. K¨ohler, and M. Welling. Spherical CNNs. InInternational Conference on Learning Representations (ICLR), 2018. |

[24] | T. S. Cohen, M. Geiger, and M. Weiler. A general theory of equivariant cnns on homogeneous spaces. InAdvances in Neural Information Processing Systems (Neurips), pages 9145-9156. 2019. |

[25] | D. R. Cox and D. V. Hinkley.Theoretical Statistics. Chapman & Hall, London, 1974. |

[26] | H. Crane and W. Dempsey. Edge exchangeable models for interaction networks.Journal of the American Statistical Association, 113(523):1311-1326, 2018. |

[27] | G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals, and Systems, 2(4):303—314, Dec 1989. |

[28] | A. Dawid. Invariance and independence in multivariate distribution theory.Journal of Multivariate Analysis, 17(3):304-315, 1985. |

[29] | B. de Finetti. Fuzione caratteristica di un fenomeno aleatorio.Mem. R. Acc. Lincei, 6(4): 86-133, 1930. |

[30] | P. Diaconis. Finite forms of de finetti’s theorem on exchangeability.Synthese, 36(2):271-281, 1977. |

[31] | P. Diaconis. Sufficiency as statistical symmetry. In F. Browder, editor,Proceedings of the AMS Centennial Symposium, pages 15-26. American Mathematical Society, 1988. |

[32] | P. Diaconis and D. Freedman. Finite exchangeable sequences.The Annals of Probability, 8 (4):745-764, 1980a. |

[33] | P. Diaconis and D. Freedman. De Finetti’s Theorem for Markov Chains.The Annals of Probability, 8(1):115-130, 02 1980b. |

[34] | P. Diaconis and D. Freedman. Partial exchangeability and sufficiency. In J. K. Ghosh and J. Roy, editors,Proc. Indian Stat. Inst. Golden Jubilee Int’l Conf. Stat.: Applications and New Directions, pages 205-236. Indian Statistical Institute, 1984. |

[35] | P. Diaconis and D. Freedman. A dozen de Finetti-style results in search of a theory.Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, 23:397-423, 1987. |

[36] | D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. InAdvances in Neural Information Processing Systems (Neurips), pages 2224-2232. 2015. |

[37] | M. L. Eaton.Group invariance in applications in statistics, volume 1 ofRegional Conference Series in Probability and Statistics. Institute of Mathematical Statistics and American Statistical Association, Haywood, CA and Alexandria, VA, 1989. |

[38] | H. Edwards and A. Storkey. Towards a neural statistician. InInternational Conference on Learning Representations (ICLR), 2017. |

[39] | R. A. Fisher. On the mathematical foundations of theoretical statistics.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 222(594-604):309—368, 1922. |

[40] | S. Fortini, L. Ladelli, and E. Regazzini. Exchangeability, predictive distributions and parametric models.Sankhy¯a: The Indian Journal of Statistics, 62(1):86-109, 2000. |

[41] | D. A. Freedman. Invariants under mixing which generalize de Finetti’s theorem.The Annals of Mathematical Statistics, 33(3):916-923, 1962. |

[42] | D. A. Freedman. Invariants under mixing which generalize de Finetti’s theorem: Continuous time parameter.The Annals of Mathematical Statistics, 34(4):1194-1216, 1963. |

[43] | F. Gao, G. Wolf, and M. Hirn. Geometric scattering for graph data analysis. InInternational Conference on Machine Learning (ICML), pages 2122-2131, 2019. |

[44] | M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shanahan, Y. W. Teh, D. Rezende, and S. M. A. Eslami. Conditional neural processes. InInternational Conference on Machine Learning (ICML), pages 1704-1713, 2018. |

[45] | R. Gens and P. M. Domingos. Deep symmetry networks. InAdvances in Neural Information Processing Systems (Neurips), pages 2537-2545. 2014. |

[46] | J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. InInternational Conference on Machine Learning (ICML), pages 1263-1272, 2017. |

[47] | N. C. Giri.Group Invariance in Statistical Inference. World Scientific, 1996. |

[48] | I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems (Neurips), pages 2672-2680. 2014. |

[49] | W. Grathwohl, K.-C. Wang, J.-H. Jacobsen, D. Duvenaud, M. Norouzi, and K. Swersky. Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations (ICLR), 2020. |

[50] | M. U. Gutmann and J. Corander. Bayesian optimization for likelihood-free inference of simulator-based statistical models.Journal of Machine Learning Research, 17(125):1-47, 2016. |

[51] | W. J. Hall, R. A. Wijsman, and J. K. Ghosh. The relationship between sufficiency and invariance with applications in sequential analysis.The Annals of Mathematical Statistics, 36(2):575-614, 1965. ISSN 00034851. |

[52] | P. R. Halmos and L. J. Savage. Application of the Radon-Nikodym theorem to the theory of sufficient statistics.The Annals of Mathematical Statistics, 20(2):225—241, 1949. |

[53] | T. Hamaguchi, H. Oiwa, M. Shimbo, and Y. Matsumoto. Knowledge transfer for outof-knowledge-base entities: A graph neural network approach. InInternational Joint Conference on Artificial Intelligence (IJCAI), pages 1802-1808, 2017. |

[54] | J. Hartford, D. Graham, K. Leyton-Brown, and S. Ravanbakhsh. Deep models of interactions across sets. InInternational Conference on Machine Learning (ICML), pages 1914-1923, 2018. |

[55] | K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016. |

[56] | R. Herzig, M. Raboh, G. Chechik, J. Berant, and A. Globerson. Mapping images to scene graphs with permutation-invariant structured prediction. InAdvances in Neural Information Processing Systems (Neurips), pages 7211-7221. 2018. |

[57] | E. Hewitt and L. J. Savage. Symmetric measures on Cartesian products.Transactions of the American Mathematical Society, 80(2):470—501, 1955. |

[58] | G. Hinton, S. Sabour, and N. Frosst. Matrix capsules with em routing. InInternational Conference on Learning Representations (ICLR), 2018. |

[59] | K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251-257, 1991. |

[60] | K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359-366, 1989. |

[61] | E. T. Jaynes. Information theory and statistical mechanics.Phys. Rev., 106:620-630, May 1957. |

[62] | O. Kallenberg.Foundations of Modern Probability. Springer-Verlag New York, 2nd edition, 2002. |

[63] | O. Kallenberg.Probabilistic Symmetries and Invariance Principles. Springer, 2005. 56 |

[64] | O. Kallenberg.Random Measures, Theory and Applications. Springer International Publishing, 2017. |

[65] | S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley. Molecular graph convolutions: moving beyond fingerprints.Journal of Computer-Aided Molecular Design, 30(8):595- 608, Aug 2016. |

[66] | N. Keriven and G. Peyr´e. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems (Neurips), pages 7092-7101. 2019. |

[67] | H. Kim, A. Mnih, J. Schwarz, M. Garnelo, A. Eslami, D. Rosenbaum, O. Vinyals, and Y. W. Teh. Attentive neural processes. InInternational Conference on Learning Representations (ICLR), 2019. |

[68] | D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. InProceedings of the 2nd International Conference on Learning Representations (ICLR), 2014. |

[69] | R. Kondor and S. Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups.InInternational Conference on Machine Learning (ICML), pages 2747-2755, 2018. |

[70] | I. Korshunova, J. Degrave, F. Huszar, Y. Gal, A. Gretton, and J. Dambre. BRUNO: A deep recurrent model for exchangeable data. InAdvances in Neural Information Processing Systems (Neurips), pages 7190-7198. 2018. |

[71] | S. L. Lauritzen.Sufficiency, prediction and extreme models.Scandinavian Journal of Statistics, 1(3):128-134, 1974a. |

[72] | S. L. Lauritzen. On the interrelationships among sufficiency, total sufficiency and some related concepts. Technical Report 8, Institute of Mathematical Statistics, University of Copenhagen, July 1974b. |

[73] | S. L. Lauritzen. Extreme point models in statistics (with discussion and reply).Scandinavian Journal of Statistics, 11(2):65-91, 1984. |

[74] | S. L. Lauritzen.Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statistics. Springer, 1988. |

[75] | S. L. Lauritzen.Graphical Models. Oxford University Press, 1996. |

[76] | Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition.Neural Computation, 1(4):541-551, 1989. |

[77] | Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F.-J. Huang.Predicting Structured Data, chapter A Tutorial on Energy-Based Learning. 2006. |

[78] | J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh. Set transformer: A framework for attention-based permutation-invariant neural networks. InInternational Conference on Machine Learning (ICML), pages 3744-3753, 2019. |

[79] | E. L. Lehmann and J. P. Romano.Testing Statistical Hypotheses. Sprinter Texts in Statistics. Springer-Verlag New York, 2005. |

[80] | J. E. Lenssen, M. Fey, and P. Libuschewski.Group equivariant capsule networks.In Advances in Neural Information Processing Systems (Neurips), pages 8844-8853. 2018. |

[81] | L. Lov´asz.Large Networks and Graph Limits. American Mathematical Society, 2012. |

[82] | C. Lyle, M. van der Wilk, M. Kwiatkowska, Y. Gal, and B. Bloem-Reddy. On the benefits of invariance in neural networks. 2020. |

[83] | H. Mahmoud.P´olya Urn Models. Chapman & Hall/CRC Texts in Statistical Science. Chapman & Hall/CRC, 2008. |

[84] | H. Maron, E. Fetaya, N. Segol, and Y. Lipman. On the universality of invariant networks. InInternational Conference on Machine Learning (ICML), pages 4363-4371, 2019. |

[85] | B. D. McKay and A. Piperno.Practical graph isomorphism, II.Journal of Symbolic Computation, 60:94 - 112, 2014. |

[86] | M. L. Minsky and S. A. Papert.Perceptrons: Expanded Edition. Cambridge, MA, USA, 1988. |

[87] | R. L. Murphy, B. Srinivasan, V. Rao, and B. Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. InInternational Conference on Learning Representations (ICLR), 2019. |

[88] | M. Niepert and G. Van den Broeck. Tractability through exchangeability: A new perspective on efficient probabilistic inference. InAAAI Conference on Artificial Intelligence, pages 2467-2475, 2014. |

[89] | M. Niepert, M. Ahmed, and K. Kutzkov. Learning convolutional neural networks for graphs. InInternational Conference on Machine Learning (ICML), pages 2014-2023, 2016. |

[90] | P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437-461, Feb 2015. |

[91] | N. Parmar, P. Ramachandran, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens. Stand-alone self-attention in vision models. InAdvances in Neural Information Processing Systems (Neurips), pages 68-80. 2019. |

[92] | A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An imperative style, high-performance deep learning library.InAdvances in Neural Information Processing Systems (Neurips), pages 8026-8037. 2019. |

[93] | S. Ravanbakhsh.Universal equivariant multilayer perceptrons.arXiv e-prints, abs/2002.02912, 2020. |

[94] | S. Ravanbakhsh, J. Schneider, and B. P´oczos. Equivariance through parameter-sharing. In International Conference on Machine Learning (ICML), pages 2892-2901, 2017. |

[95] | D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. InInternational Conference on Machine Learning (ICML), number 2, pages 1278-1286, 2014. |

[96] | D. W. Romero, E. J. Bekkers, J. M. Tomczak, and M. Hoogendoorn. Attentive group equivariant convolutional networks.arXiv e-prints, abs/2002.03830, 2020. |

[97] | J. Rotman.An Introduction to the Theory of Groups, volume 148 ofGraduate Texts in Mathematics. Springer-Verlag New York, 4 edition, 1995. |

[98] | S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. InAdvances in Neural Information Processing Systems (Neurips), pages 3856-3866. 2017. |

[99] | F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model.IEEE Transactions on Neural Networks, 20(1):61-80, 2009. |

[100] | M. J. Schervish.Theory of Statistics. Springer-Verlag New York, 1995. |

[101] | W. Schindler.Measures with Symmetry Properties, volume 1808 ofLecture Notes in Mathematics. Springer-Verlag Berlin Heidelberg, Bernlin, 2003. |

[102] | N. Segol and Y. Lipman. On universal equivariant set networks. InInternational Conference on Learning Representations (ICLR), 2020. |

[103] | J. Shawe-Taylor. Building symmetries into feedforward networks. In1989 First IEE International Conference on Artificial Neural Networks, (Conf. Publ. No. 313), pages 158-162, Oct 1989. |

[104] | J. Shawe-Taylor. Threshold network learning in the presence of equivalences. InAdvances in Neural Information Processing Systems (Neurips), pages 879-886. Morgan-Kaufmann, 1991. |

[105] | J. Shawe-Taylor. Symmetries and discriminability in feedforward network architectures. IEEE Transactions on Neural Networks, 4(5):816-826, 1993. |

[106] | J. Shawe-Taylor. Sample sizes for threshold networks with equivalences.Information and Computation, 118(1):65 - 72, 1995. |

[107] | M. Skibinsky. Adequate subfields and sufficiency.The Annals of Mathematical Statistics, 38(1):155-161, 1967. ISSN 00034851. |

[108] | P. Smolensky.Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194-281. 1987. |

[109] | E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. InAdvances in Neural Information Processing Systems (Neurips), pages 1257-1264. 2006. |

[110] | T. P. Speed. A factorisation theorem for adequate statistics.Australian Journal of Statistics, 20(3):240-249, 1978. |

[111] | N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15:1929-1958, 2014. |

[112] | S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus. End-to-end memory networks. In Advances in Neural Information Processing Systems (Neurips), pages 2440-2448. 2015. |

[113] | The GAP Group. GAP - Groups, Algorithms, and Programming, Version 4.10.0, 2018. URLhttps://www.gap-system.org. |

[114] | Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions.arXiv e-prints, abs/1605.02688, May 2016. |

[115] | J.-W. van de Meent, B. Paige, H. Yang, and F. Wood. An introduction to probabilistic programming. 09 2018. URLhttps://arxiv.org/pdf/1809.10756. |

[116] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (Neurips), pages 5998-6008. 2017. |

[117] | S. Wager, W. Fithian, S. Wang, and P. S. Liang. Altitude training: Strong bounds for single-layer dropout. InAdvances in Neural Information Processing Systems (Neurips), pages 100-108. 2014. |

[118] | E. Wagstaff, F. Fuchs, M. Engelcke, I. Posner, and M. A. Osborne. On the limitations of representing functions on sets. InInternational Conference on Machine Learning (ICML), pages 6487-6494, 2019. |

[119] | M. Welling, M. Rosen-zvi, and G. E. Hinton. Exponential family harmoniums with an application to information retrieval. InAdvances in Neural Information Processing Systems (Neurips), pages 1481-1488. 2005. |

[120] | R. A. Wijsman.Invariant measures on groups and their use in statistics, volume 14 of Lecture Notes-Monograph Series. Institute of Mathematical Statistics, Hayward, CA, 1990. |

[121] | S. A. Williamson. Nonparametric network models for link prediction.Journal of Machine Learning Research, 17(202):1-21, 2016. |

[122] | S. Wiqvist, P.-A. Mattei, U. Picchini, and J. Frellsen. Partially exchangeable networks and architectures for learning summary statistics in approximate Bayesian computation. In International Conference on Machine Learning (ICML), pages 6798-6807, 2019. |

[123] | Wolfram Research, Inc. Mathematica, Version 11.3, 2018. Champaign, IL. |

[124] | J. Wood and J. Shawe-Taylor. Representation theory and invariant neural networks.Discrete Applied Mathematics, 69(1):33-60, 1996. |

[125] | K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019. |

[126] | D. Yarotsky. Universal approximations of invariant maps by neural networks.arXiv e-prints, abs/1804.10306, 2018. |

[127] | F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. InInternational Conference on Learning Representations (ICLR), 2016. |

[128] | S. L. Zabell.The Rule of Succession, pages 38-73. Cambridge Studies in Probability, Induction and Decision Theory. Cambridge University Press, 2005. |

[129] | M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. InAdvances in Neural Information Processing Systems (Neurips), pages 3391- 3401. 2017. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.