×

Probabilistic symmetries and invariant neural networks. (English) Zbl 1502.62092

Summary: Treating neural network inputs and outputs as random variables, we characterize the structure of neural networks that can be used to model data that are invariant or equivariant under the action of a compact group. Much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures, in an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings. By considering group invariance from the perspective of probabilistic symmetry, we establish a link between functional and probabilistic symmetry, and obtain generative functional representations of probability distributions that are invariant or equivariant under the action of a compact group. Our representations completely characterize the structure of neural networks that can be used to model such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We demonstrate that examples from the recent literature are special cases, and develop the details of the general program for exchangeable sequences and arrays.

MSC:

62M45 Neural nets and related approaches to inference from stochastic processes
60G09 Exchangeability for stochastic processes
62B05 Sufficient statistics and fields
68T07 Artificial neural networks and deep learning
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URLhttps://www.tensorflow.org/. Software available from tensorflow.org.
[2] D. J. Aldous. Exchangeability and related topics. In P. L. Hennequin, editor,Ecole d’ ´´Et´e de Probabilit´es de Saint-Flour XIII - 1983, number 1117 in Lecture Notes in Mathematics, pages 1-198. Springer, 1985. · Zbl 0562.60042
[3] S. Andersson. Distributions of maximal invariants using quotient measures.The Annals of Statistics, 10(3):955-961, Sep 1982. · Zbl 0489.62003
[4] S. Arora, R. Ge, B. Neyshabur, and Y. Zhang. Stronger generalization bounds for deep nets via a compression approach. InInternational Conference on Machine Learning (ICML), pages 254-263, 2018.
[5] T. Austin. Exchangeable random measures.Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, 51(3):842-861, 08 2015. · Zbl 1323.60068
[6] S. Bacallado, S. Favaro, and L. Trippa.Bayesian nonparametric analysis of reversible Markov chains.The Annals of Statistics, 41(2):870-896, 2013. · Zbl 1360.62481
[7] R. R. Bahadur. Sufficiency and statistical decision functions.The Annals of Mathematical Statistics, 25(3):423-462, 1954. · Zbl 0057.35604
[8] I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le. Attention augmented convolutional networks. InThe IEEE International Conference on Computer Vision (ICCV), 2019.
[9] A. Bietti and J. Mairal. Group invariance, stability to deformations, and complexity of deep convolutional representations.Journal of Machine Learning Research, 20(25):1-49, 2019. · Zbl 1483.68335
[10] D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn schemes.The Annals of Statistics, 1(2):353—355, Mar 1973. · Zbl 0276.62010
[11] B. Bloem-Reddy and Y. W. Teh.Neural network models of exchangeable sequences. NeurIPS Workshop on Bayesian Deep Learning, 2018.
[12] C. Borgs and J. Chayes. Graphons: A nonparametric method to model, estimate, and design algorithms for massive networks.Proceedings of the 2017 ACM Conference on Economics and Computation - EC ’17, 2017.
[13] J. Bruna and S. Mallat. Invariant scattering convolution networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1872-1876, 2013.
[14] J. Bruna and S. Mallat. Multiscale sparse microcanonical models.Mathematical Statistics and Learning, 1(3/4):257-315, 2018. · Zbl 1426.62111
[15] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. InInternational Conference on Learning Representations (ICLR), 2014.
[16] D. Cai, T. Campbell, and T. Broderick. Edge-exchangeable graphs and sparsity. InAdvances in Neural Information Processing Systems (Neurips), pages 4249-4257. 2016.
[17] F. Caron and E. B. Fox. Sparse graphs using exchangeable random measures.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1-44, 2017.
[18] E. C¸ inlar.Probability and Stochastics. Springer New York, 2011. · Zbl 1226.60001
[19] J. Chan, V. Perrone, J. Spence, P. Jenkins, S. Mathieson, and Y. Song. A likelihood-free inference framework for population genetic data using exchangeable neural networks. In Advances in Neural Information Processing Systems (Neurips), pages 8594-8605. 2018.
[20] S. Chen, E. Dobriban, and J. H. Lee. Invariance reduces variance: Understanding data augmentation in deep learning and beyond.arXiv e-prints, abs/1907.10905, 2019.
[21] T. S. Cohen and M. Welling. Group equivariant convolutional networks. InInternational Conference on Machine Learning (ICML), pages 2990-2999, 2016.
[22] T. S. Cohen and M. Welling. Steerable CNNs. InInternational Conference on Learning Representations (ICLR), 2017.
[23] T. S. Cohen, M. Geiger, J. K¨ohler, and M. Welling. Spherical CNNs. InInternational Conference on Learning Representations (ICLR), 2018.
[24] T. S. Cohen, M. Geiger, and M. Weiler. A general theory of equivariant cnns on homogeneous spaces. InAdvances in Neural Information Processing Systems (Neurips), pages 9145-9156. 2019.
[25] D. R. Cox and D. V. Hinkley.Theoretical Statistics. Chapman & Hall, London, 1974. · Zbl 0334.62003
[26] H. Crane and W. Dempsey. Edge exchangeable models for interaction networks.Journal of the American Statistical Association, 113(523):1311-1326, 2018. · Zbl 1402.90027
[27] G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals, and Systems, 2(4):303—314, Dec 1989. · Zbl 0679.94019
[28] A. Dawid. Invariance and independence in multivariate distribution theory.Journal of Multivariate Analysis, 17(3):304-315, 1985. · Zbl 0602.62038
[29] B. de Finetti. Fuzione caratteristica di un fenomeno aleatorio.Mem. R. Acc. Lincei, 6(4): 86-133, 1930.
[30] P. Diaconis. Finite forms of de finetti’s theorem on exchangeability.Synthese, 36(2):271-281, 1977. · Zbl 0397.60005
[31] P. Diaconis. Sufficiency as statistical symmetry. In F. Browder, editor,Proceedings of the AMS Centennial Symposium, pages 15-26. American Mathematical Society, 1988. · Zbl 0928.62006
[32] P. Diaconis and D. Freedman. Finite exchangeable sequences.The Annals of Probability, 8 (4):745-764, 1980a. · Zbl 0434.60034
[33] P. Diaconis and D. Freedman. De Finetti’s Theorem for Markov Chains.The Annals of Probability, 8(1):115-130, 02 1980b. · Zbl 0426.60064
[34] P. Diaconis and D. Freedman. Partial exchangeability and sufficiency. In J. K. Ghosh and J. Roy, editors,Proc. Indian Stat. Inst. Golden Jubilee Int’l Conf. Stat.: Applications and New Directions, pages 205-236. Indian Statistical Institute, 1984.
[35] P. Diaconis and D. Freedman. A dozen de Finetti-style results in search of a theory.Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, 23:397-423, 1987. · Zbl 0619.60039
[36] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. InAdvances in Neural Information Processing Systems (Neurips), pages 2224-2232. 2015.
[37] M. L. Eaton.Group invariance in applications in statistics, volume 1 ofRegional Conference Series in Probability and Statistics. Institute of Mathematical Statistics and American Statistical Association, Haywood, CA and Alexandria, VA, 1989. · Zbl 0749.62005
[38] H. Edwards and A. Storkey. Towards a neural statistician. InInternational Conference on Learning Representations (ICLR), 2017.
[39] R. A. Fisher. On the mathematical foundations of theoretical statistics.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 222(594-604):309—368, 1922. · JFM 48.1280.02
[40] S. Fortini, L. Ladelli, and E. Regazzini. Exchangeability, predictive distributions and parametric models.Sankhy¯a: The Indian Journal of Statistics, 62(1):86-109, 2000. · Zbl 0973.62002
[41] D. A. Freedman. Invariants under mixing which generalize de Finetti’s theorem.The Annals of Mathematical Statistics, 33(3):916-923, 1962. · Zbl 0201.49501
[42] D. A. Freedman. Invariants under mixing which generalize de Finetti’s theorem: Continuous time parameter.The Annals of Mathematical Statistics, 34(4):1194-1216, 1963. · Zbl 0253.60064
[43] F. Gao, G. Wolf, and M. Hirn. Geometric scattering for graph data analysis. InInternational Conference on Machine Learning (ICML), pages 2122-2131, 2019.
[44] M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shanahan, Y. W. Teh, D. Rezende, and S. M. A. Eslami. Conditional neural processes. InInternational Conference on Machine Learning (ICML), pages 1704-1713, 2018.
[45] R. Gens and P. M. Domingos. Deep symmetry networks. InAdvances in Neural Information Processing Systems (Neurips), pages 2537-2545. 2014.
[46] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. InInternational Conference on Machine Learning (ICML), pages 1263-1272, 2017.
[47] N. C. Giri.Group Invariance in Statistical Inference. World Scientific, 1996. · Zbl 0861.62003
[48] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems (Neurips), pages 2672-2680. 2014.
[49] W. Grathwohl, K.-C. Wang, J.-H. Jacobsen, D. Duvenaud, M. Norouzi, and K. Swersky. Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations (ICLR), 2020.
[50] M. U. Gutmann and J. Corander. Bayesian optimization for likelihood-free inference of simulator-based statistical models.Journal of Machine Learning Research, 17(125):1-47, 2016. · Zbl 1392.62072
[51] W. J. Hall, R. A. Wijsman, and J. K. Ghosh. The relationship between sufficiency and invariance with applications in sequential analysis.The Annals of Mathematical Statistics, 36(2):575-614, 1965. ISSN 00034851. · Zbl 0227.62007
[52] P. R. Halmos and L. J. Savage. Application of the Radon-Nikodym theorem to the theory of sufficient statistics.The Annals of Mathematical Statistics, 20(2):225—241, 1949. · Zbl 0034.07502
[53] T. Hamaguchi, H. Oiwa, M. Shimbo, and Y. Matsumoto. Knowledge transfer for outof-knowledge-base entities: A graph neural network approach. InInternational Joint Conference on Artificial Intelligence (IJCAI), pages 1802-1808, 2017.
[54] J. Hartford, D. Graham, K. Leyton-Brown, and S. Ravanbakhsh. Deep models of interactions across sets. InInternational Conference on Machine Learning (ICML), pages 1914-1923, 2018.
[55] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016.
[56] R. Herzig, M. Raboh, G. Chechik, J. Berant, and A. Globerson. Mapping images to scene graphs with permutation-invariant structured prediction. InAdvances in Neural Information Processing Systems (Neurips), pages 7211-7221. 2018.
[57] E. Hewitt and L. J. Savage. Symmetric measures on Cartesian products.Transactions of the American Mathematical Society, 80(2):470—501, 1955. · Zbl 0066.29604
[58] G. Hinton, S. Sabour, and N. Frosst. Matrix capsules with em routing. InInternational Conference on Learning Representations (ICLR), 2018.
[59] K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251-257, 1991.
[60] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators.Neural Networks, 2(5):359-366, 1989. · Zbl 1383.92015
[61] E. T. Jaynes. Information theory and statistical mechanics.Phys. Rev., 106:620-630, May 1957. · Zbl 0084.43701
[62] O. Kallenberg.Foundations of Modern Probability. Springer-Verlag New York, 2nd edition, 2002. · Zbl 0996.60001
[63] O. Kallenberg.Probabilistic Symmetries and Invariance Principles. Springer, 2005. 56 · Zbl 1084.60003
[64] O. Kallenberg.Random Measures, Theory and Applications. Springer International Publishing, 2017. · Zbl 1376.60003
[65] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley. Molecular graph convolutions: moving beyond fingerprints.Journal of Computer-Aided Molecular Design, 30(8):595- 608, Aug 2016.
[66] N. Keriven and G. Peyr´e. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems (Neurips), pages 7092-7101. 2019.
[67] H. Kim, A. Mnih, J. Schwarz, M. Garnelo, A. Eslami, D. Rosenbaum, O. Vinyals, and Y. W. Teh. Attentive neural processes. InInternational Conference on Learning Representations (ICLR), 2019.
[68] D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. InProceedings of the 2nd International Conference on Learning Representations (ICLR), 2014.
[69] R. Kondor and S. Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups.InInternational Conference on Machine Learning (ICML), pages 2747-2755, 2018.
[70] I. Korshunova, J. Degrave, F. Huszar, Y. Gal, A. Gretton, and J. Dambre. BRUNO: A deep recurrent model for exchangeable data. InAdvances in Neural Information Processing Systems (Neurips), pages 7190-7198. 2018.
[71] S. L. Lauritzen.Sufficiency, prediction and extreme models.Scandinavian Journal of Statistics, 1(3):128-134, 1974a. · Zbl 0297.62068
[72] S. L. Lauritzen. On the interrelationships among sufficiency, total sufficiency and some related concepts. Technical Report 8, Institute of Mathematical Statistics, University of Copenhagen, July 1974b.
[73] S. L. Lauritzen. Extreme point models in statistics (with discussion and reply).Scandinavian Journal of Statistics, 11(2):65-91, 1984. · Zbl 0542.62002
[74] S. L. Lauritzen.Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statistics. Springer, 1988. · Zbl 0681.62009
[75] S. L. Lauritzen.Graphical Models. Oxford University Press, 1996. · Zbl 0907.62001
[76] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition.Neural Computation, 1(4):541-551, 1989.
[77] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F.-J. Huang.Predicting Structured Data, chapter A Tutorial on Energy-Based Learning. 2006.
[78] J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh. Set transformer: A framework for attention-based permutation-invariant neural networks. InInternational Conference on Machine Learning (ICML), pages 3744-3753, 2019.
[79] E. L. Lehmann and J. P. Romano.Testing Statistical Hypotheses. Sprinter Texts in Statistics. Springer-Verlag New York, 2005. · Zbl 1076.62018
[80] J. E. Lenssen, M. Fey, and P. Libuschewski.Group equivariant capsule networks.In Advances in Neural Information Processing Systems (Neurips), pages 8844-8853. 2018.
[81] L. Lov´asz.Large Networks and Graph Limits. American Mathematical Society, 2012. · Zbl 1292.05001
[82] C. Lyle, M. van der Wilk, M. Kwiatkowska, Y. Gal, and B. Bloem-Reddy. On the benefits of invariance in neural networks. 2020.
[83] H. Mahmoud.P´olya Urn Models. Chapman & Hall/CRC Texts in Statistical Science. Chapman & Hall/CRC, 2008.
[84] H. Maron, E. Fetaya, N. Segol, and Y. Lipman. On the universality of invariant networks. InInternational Conference on Machine Learning (ICML), pages 4363-4371, 2019.
[85] B. D. McKay and A. Piperno.Practical graph isomorphism, II.Journal of Symbolic Computation, 60:94 - 112, 2014. · Zbl 1394.05079
[86] M. L. Minsky and S. A. Papert.Perceptrons: Expanded Edition. Cambridge, MA, USA, 1988. · Zbl 0794.68104
[87] R. L. Murphy, B. Srinivasan, V. Rao, and B. Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. InInternational Conference on Learning Representations (ICLR), 2019.
[88] M. Niepert and G. Van den Broeck. Tractability through exchangeability: A new perspective on efficient probabilistic inference. InAAAI Conference on Artificial Intelligence, pages 2467-2475, 2014.
[89] M. Niepert, M. Ahmed, and K. Kutzkov. Learning convolutional neural networks for graphs. InInternational Conference on Machine Learning (ICML), pages 2014-2023, 2016.
[90] P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437-461, Feb 2015.
[91] N. Parmar, P. Ramachandran, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens. Stand-alone self-attention in vision models. InAdvances in Neural Information Processing Systems (Neurips), pages 68-80. 2019.
[92] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An imperative style, high-performance deep learning library.InAdvances in Neural Information Processing Systems (Neurips), pages 8026-8037. 2019.
[93] S. Ravanbakhsh.Universal equivariant multilayer perceptrons.arXiv e-prints, abs/2002.02912, 2020.
[94] S. Ravanbakhsh, J. Schneider, and B. P´oczos. Equivariance through parameter-sharing. In International Conference on Machine Learning (ICML), pages 2892-2901, 2017.
[95] D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. InInternational Conference on Machine Learning (ICML), number 2, pages 1278-1286, 2014.
[96] D. W. Romero, E. J. Bekkers, J. M. Tomczak, and M. Hoogendoorn. Attentive group equivariant convolutional networks.arXiv e-prints, abs/2002.03830, 2020.
[97] J. Rotman.An Introduction to the Theory of Groups, volume 148 ofGraduate Texts in Mathematics. Springer-Verlag New York, 4 edition, 1995. · Zbl 0810.20001
[98] S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. InAdvances in Neural Information Processing Systems (Neurips), pages 3856-3866. 2017.
[99] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model.IEEE Transactions on Neural Networks, 20(1):61-80, 2009.
[100] M. J. Schervish.Theory of Statistics. Springer-Verlag New York, 1995. · Zbl 0834.62002
[101] W. Schindler.Measures with Symmetry Properties, volume 1808 ofLecture Notes in Mathematics. Springer-Verlag Berlin Heidelberg, Bernlin, 2003.
[102] N. Segol and Y. Lipman. On universal equivariant set networks. InInternational Conference on Learning Representations (ICLR), 2020.
[103] J. Shawe-Taylor. Building symmetries into feedforward networks. In1989 First IEE International Conference on Artificial Neural Networks, (Conf. Publ. No. 313), pages 158-162, Oct 1989.
[104] J. Shawe-Taylor. Threshold network learning in the presence of equivalences. InAdvances in Neural Information Processing Systems (Neurips), pages 879-886. Morgan-Kaufmann, 1991.
[105] J. Shawe-Taylor. Symmetries and discriminability in feedforward network architectures. IEEE Transactions on Neural Networks, 4(5):816-826, 1993.
[106] J. Shawe-Taylor. Sample sizes for threshold networks with equivalences.Information and Computation, 118(1):65 - 72, 1995. · Zbl 0827.68007
[107] M. Skibinsky. Adequate subfields and sufficiency.The Annals of Mathematical Statistics, 38(1):155-161, 1967. ISSN 00034851. · Zbl 0155.25701
[108] P. Smolensky.Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194-281. 1987.
[109] E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. InAdvances in Neural Information Processing Systems (Neurips), pages 1257-1264. 2006.
[110] T. P. Speed. A factorisation theorem for adequate statistics.Australian Journal of Statistics, 20(3):240-249, 1978. · Zbl 0415.62006
[111] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15:1929-1958, 2014. · Zbl 1318.68153
[112] S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus. End-to-end memory networks. In Advances in Neural Information Processing Systems (Neurips), pages 2440-2448. 2015.
[113] The GAP Group. GAP - Groups, Algorithms, and Programming, Version 4.10.0, 2018. URLhttps://www.gap-system.org.
[114] Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions.arXiv e-prints, abs/1605.02688, May 2016.
[115] J.-W. van de Meent, B. Paige, H. Yang, and F. Wood. An introduction to probabilistic programming. 09 2018. URLhttps://arxiv.org/pdf/1809.10756.
[116] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (Neurips), pages 5998-6008. 2017.
[117] S. Wager, W. Fithian, S. Wang, and P. S. Liang. Altitude training: Strong bounds for single-layer dropout. InAdvances in Neural Information Processing Systems (Neurips), pages 100-108. 2014.
[118] E. Wagstaff, F. Fuchs, M. Engelcke, I. Posner, and M. A. Osborne. On the limitations of representing functions on sets. InInternational Conference on Machine Learning (ICML), pages 6487-6494, 2019.
[119] M. Welling, M. Rosen-zvi, and G. E. Hinton. Exponential family harmoniums with an application to information retrieval. InAdvances in Neural Information Processing Systems (Neurips), pages 1481-1488. 2005.
[120] R. A. Wijsman.Invariant measures on groups and their use in statistics, volume 14 of Lecture Notes-Monograph Series. Institute of Mathematical Statistics, Hayward, CA, 1990. · Zbl 0803.62001
[121] S. A. Williamson. Nonparametric network models for link prediction.Journal of Machine Learning Research, 17(202):1-21, 2016. · Zbl 1436.62156
[122] S. Wiqvist, P.-A. Mattei, U. Picchini, and J. Frellsen. Partially exchangeable networks and architectures for learning summary statistics in approximate Bayesian computation. In International Conference on Machine Learning (ICML), pages 6798-6807, 2019.
[123] Wolfram Research, Inc. Mathematica, Version 11.3, 2018. Champaign, IL.
[124] J. Wood and J. Shawe-Taylor. Representation theory and invariant neural networks.Discrete Applied Mathematics, 69(1):33-60, 1996. · Zbl 0855.68083
[125] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR), 2019.
[126] D. Yarotsky. Universal approximations of invariant maps by neural networks.arXiv e-prints, abs/1804.10306, 2018.
[127] F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. InInternational Conference on Learning Representations (ICLR), 2016.
[128] S. L. Zabell.The Rule of Succession, pages 38-73. Cambridge Studies in Probability, Induction and Decision Theory. Cambridge University Press, 2005. · Zbl 1100.01001
[129] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola. Deep sets. InAdvances in Neural Information Processing Systems (Neurips), pages 3391- 3401. 2017.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.