×

Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures. (English) Zbl 07330533

Summary: We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape in the vicinity of a Bayes-optimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayes-optimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved.

MSC:

82-XX Statistical mechanics, structure of matter
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Shalev-Shwartz S and Ben-David S 2014 Understanding Machine Learning: From Theory to Algorithms (Cambridge: Cambridge University Press) · Zbl 1305.68005 · doi:10.1017/CBO9781107298019
[2] Hochreiter S and Schmidhuber J 1997 Flat minima Neural Comput.9 1 · Zbl 0872.68150 · doi:10.1162/neco.1997.9.1.1
[3] Baldassi C, Ingrosso A, Lucibello C, Saglietti L and Zecchina R 2015 Phys. Rev. Lett.115 128101 · doi:10.1103/physrevlett.115.128101
[4] Keskar N S, Mudigere D, Nocedal J, Smelyanskiy M and Tang P T P 2016 arXiv:1609.04836
[5] Jiang Y, Neyshabur B, Mobahi H, Krishnan D and Bengio S 2019 arXiv:1912.02178
[6] Dziugaite G K and Roy D M 2017 arXiv:1703.11008
[7] Baldassi C, Pittorino F and Zecchina R 2020 Proc. Natl Acad. Sci. USA117 161 · Zbl 1456.92009 · doi:10.1073/pnas.1908636117
[8] Baldassi C, Borgs C, Chayes J T, Ingrosso A, Lucibello C, Saglietti L and Zecchina R 2016 Proc. Natl Acad. Sci. USA113 E7655 · doi:10.1073/pnas.1608103113
[9] Baldassi C, Malatesta E M and Zecchina R 2019 Phys. Rev. Lett.123 170602 · doi:10.1103/physrevlett.123.170602
[10] Borra F, Cosentino Lagomarsino M, Rotondo P and Gherardi M 2019 J. Phys. A: Math. Theor.52 384004 · Zbl 1504.68195 · doi:10.1088/1751-8121/ab3709
[11] Pastore M, Rotondo P, Erba V and Gherardi M 2020 Phys. Rev. E 102 032119 · doi:10.1103/physreve.102.032119
[12] Rotondo P, Pastore M and Gherardi M 2020 Phys. Rev. Lett.125 120601 · doi:10.1103/physrevlett.125.120601
[13] Goldt S, Mézard M, Krzakala F and Zdeborová L 2019 arXiv:1909.1150
[14] Gerace F, Loureiro B, Krzakala F, Mézard M and Zdeborová L 2020 arXiv:2002.09339
[15] Mai X and Liao Z 2019 arXiv:1905.13742
[16] Lelarge M and Miolane L 2019 arXiv:1907.03792
[17] Deng Z, Kammoun A and Thrampoulidis C 2019 arXiv:1911.05822
[18] Lesieur T, De Bacco C, Banks J, Krzakala F, Moore C and Zdeborová L 2016 54th Annual Allerton Conf. on Communication, Control, and Computing (Allerton) (Piscataway, NJ: IEEE) 601-8
[19] Mignacco F, Krzakala F, Lu Y M and Zdeborová L 2020 arXiv:2002.11544
[20] Baldassi C, Ingrosso A, Lucibello C, Saglietti L and Zecchina R 2016 J. Stat. Mech. P023301 · Zbl 1456.94029 · doi:10.1088/1742-5468/2016/02/023301
[21] Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J T, Sagun L and Zecchina R 2017 5th Int. Conf. on Learning Representations, ICLR 2017 Conference Track Proceedings(Toulon, France, April 24-26, 2017)
[22] Welling M and Teh Y W 2011 Proc. 28th Int. Conf. on Machine Learning 681-8
[23] Pittorino F, Lucibello C, Feinauer C, Malatesta E M, Perugini G, Baldassi C, Negri M, Demyanenko E and Zecchina R 2020 arXiv:2006.07897
[24] Franz S and Parisi G 1995 J. Phys. I 5 1401 · doi:10.1051/jp1:1995201
[25] Huang H and Kabashima Y 2014 Phys. Rev. E 90 052813 · doi:10.1103/physreve.90.052813
[26] Gardner E 1988 J. Phys. A: Math. Gen.21 257 · Zbl 1128.82302 · doi:10.1088/0305-4470/21/1/030
[27] Gardner E and Derrida B 1988 J. Phys. A: Math. Gen.21 271 · doi:10.1088/0305-4470/21/1/031
[28] Engel A and Van den Broeck C 2001 Statistical Mechanics of Learning (Cambridge: Cambridge University Press) · Zbl 0984.82034 · doi:10.1017/CBO9781139164542
[29] Monasson R 1995 Phys. Rev. Lett.75 2847 · doi:10.1103/physrevlett.75.2847
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.