×

zbMATH — the first resource for mathematics

A sober look at the unsupervised learning of disentangled representations and their evaluation. (English) Zbl 07306898
Summary: The idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train over 14000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on eight data sets. We observe that while the different methods successfully enforce properties “encouraged” by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, different evaluation metrics do not always agree on what should be considered “disentangled” and exhibit systematic differences in the estimation. Finally, increased disentanglement does not seem to necessarily lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
Software:
ProbTorch; Scikit
PDF BibTeX XML Cite
Full Text: Link
References:
[1] Miguel A Arcones and Evarist Gine. On the bootstrap of u and v statistics.The Annals of Statistics, pages 655-674, 1992.
[2] Francis Bach and Michael Jordan. Kernel independent component analysis.Journal of Machine Learning Research, 3(7):1-48, 2002.
[3] Yoshua Bengio, Yann LeCun, et al. Scaling learning algorithms towards AI.Large-scale Kernel Machines, 34(5):1-41, 2007.
[4] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013.
[5] Diane Bouchacourt, Ryota Tomioka, and Sebastian Nowozin. Multi-level variational autoencoder: Learning disentangled representations from grouped observations. InAAAI Conference on Artificial Intelligence, 2018.
[6] Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in beta-VAE.arXiv preprint arXiv:1804.03599, 2018.
[7] Tian Qi Chen, Xuechen Li, Roger Grosse, and David Duvenaud. Isolating sources of disentanglement in variational autoencoders. InAdvances in Neural Information Processing Systems, 2018.
[8] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In
[9] Brian Cheung, Jesse A Livezey, Arjun K Bansal, and Bruno A Olshausen. Discovering hidden factors of variation in deep networks.arXiv preprint arXiv:1412.6583, 2014.
[10] Taco Cohen and Max Welling. Learning the irreducible representations of commutative lie groups. InInternational Conference on Machine Learning, 2014a.
[11] Taco S Cohen and Max Welling. Transformation properties of learned visual representations.arXiv preprint arXiv:1412.7659, 2014b.
[12] Pierre Comon. Independent component analysis, a new concept?Signal Processing, 36(3):287-314, 1994.
[13] Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, Iain Matthews, and Greg Mori. Factorized variational autoencoders for modeling audience reactions to movies. InIEEE Conference on Computer Vision and Pattern Recognition, 2017.
[14] Emily L Denton and Vighnesh Birodkar. Unsupervised learning of disentangled representations from video. InAdvances in Neural Information Processing Systems, 2017.
[15] Guillaume Desjardins, Aaron Courville, and Yoshua Bengio. Disentangling factors of variation via generative entangling.arXiv preprint arXiv:1210.5474, 2012.
[16] Sunny Duan, Nicholas Watters, Loic Matthey, Christopher P Burgess, Alexander Lerchner, and Irina Higgins. A heuristic for unsupervised model selection for variational disentangled representation learning.arXiv preprint arXiv:1905.12614, 2019.
[17] Cian Eastwood and Christopher KI Williams. A framework for the quantitative evaluation of disentangled representations. InInternational Conference on Learning Representations, 2018.
[18] Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, and Gunnar Rätsch. Deep self-organization: Interpretable discrete representation learning on time series. InInternational Conference on Learning Representations, 2019.
[19] Marco Fraccaro, Simon Kamronn, Ulrich Paquet, and Ole Winther. A disentangled recognition and nonlinear dynamics model for unsupervised learning. InAdvances in Neural Information Processing Systems, 2017.
[20] Muhammad Waleed Gondal, Manuel Wüthrich, Djordje Miladinovi´c, Francesco Locatello, Martin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Schölkopf, and Stefan Bauer. On the transfer of inductive bias from simulation to the real world: a new disentanglement dataset. In
[21] Ian Goodfellow, Honglak Lee, Quoc V Le, Andrew Saxe, and Andrew Y Ng. Measuring invariances in deep networks. InAdvances in Neural Information Processing Systems, 2009.
[22] Ross Goroshin, Michael F Mathieu, and Yann LeCun. Learning to linearize under uncertainty. In Advances in Neural Information Processing Systems, 2015.
[23] Luigi Gresele, Paul K. Rubenstein, Arash Mehrjou, Francesco Locatello, and Bernhard Schölkopf. The incomplete rosetta stone problem: Identifiability results for multi-view nonlinear ica. In Conference on Uncertainty in Artificial Intelligence (UAI), 2019.
[24] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE: Learning basic visual concepts with a constrained variational framework. InInternational Conference on Learning Representations, 2017a.
[25] Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. Darla: Improving zero-shot transfer in reinforcement learning. InInternational Conference on Machine Learning, 2017b.
[26] Irina Higgins, David Amos, David Pfau, Sebastien Racaniere, Loic Matthey, Danilo Rezende, and Alexander Lerchner. Towards a definition of disentangled representations.arXiv preprint arXiv:1812.02230, 2018a.
[27] Irina Higgins, Nicolas Sonnerat, Loic Matthey, Arka Pal, Christopher P Burgess, Matko Bošnjak, Murray Shanahan, Matthew Botvinick, Demis Hassabis, and Alexander Lerchner. Scan: Learning hierarchical compositional visual concepts. InInternational Conference on Learning Representations, 2018b.
[28] Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. Transforming auto-encoders. InInternational Conference on Artificial Neural Networks, 2011.
[29] Haruo Hosoya. Group-based learning of disentangled representations with generalizability for novel contents. InInternational Joint Conference on Artificial Intelligence, pages 2506-2513, 2019.
[30] Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li F Fei-Fei, and Juan Carlos Niebles. Learning to decompose and disentangle representations for video prediction. InAdvances in Neural Information Processing Systems, 2018.
[31] Wei-Ning Hsu, Yu Zhang, and James Glass. Unsupervised learning of disentangled and interpretable representations from sequential data. InAdvances in Neural Information Processing Systems, 2017.
[32] Aapo Hyvarinen and Hiroshi Morioka. Unsupervised feature extraction by time-contrastive learning and nonlinear ica. InAdvances in Neural Information Processing Systems, 2016.
[33] Aapo Hyvärinen and Petteri Pajunen. Nonlinear independent component analysis: Existence and uniqueness results.Neural Networks, 1999.
[34] Aapo Hyvarinen, Hiroaki Sasaki, and Richard E Turner. Nonlinear ica using auxiliary variables and generalized contrastive learning. InInternational Conference on Artificial Intelligence and Statistics, 2019.
[35] Christian Jutten and Juha Karhunen. Advances in nonlinear blind source separation. InInternational Symposium on Independent Component Analysis and Blind Signal Separation, pages 245-256, 2003.
[36] Theofanis Karaletsos, Serge Belongie, and Gunnar Rätsch. Bayesian representation learning with oracle constraints.arXiv preprint arXiv:1506.05011, 2015.
[37] Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, and Aapo Hyvarinen. Variational autoencoders and nonlinear ica: A unifying framework. InInternational Conference on Artificial Intelligence and Statistics, pages 2207-2217, 2020.
[38] Hyunjik Kim and Andriy Mnih. Disentangling by factorising. InInternational Conference on Machine Learning, 2018.
[39] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations, 2014.
[40] Tejas D Kulkarni, William F Whitney, Pushmeet Kohli, and Josh Tenenbaum. Deep convolutional inverse graphics network. InAdvances in Neural Information Processing Systems, 2015.
[41] Abhishek Kumar, Prasanna Sattigeri, and Avinash Balakrishnan. Variational inference of disentangled latent concepts from unlabeled observations. InInternational Conference on Learning Representations, 2018.
[42] Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people.Behavioral and Brain Sciences, 40, 2017.
[43] Adrien Laversanne-Finot, Alexandre Pere, and Pierre-Yves Oudeyer. Curiosity driven exploration of learned disentangled goal spaces. InConference on Robot Learning, 2018.
[44] Yann LeCun, Fu Jie Huang, and Leon Bottou. Learning methods for generic object recognition with invariance to pose and lighting. InIEEE Conference on Computer Vision and Pattern Recognition, 2004.
[45] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.Nature, 521(7553):436, 2015.
[46] Karel Lenc and Andrea Vedaldi. Understanding image representations by measuring their equivariance and equivalence. InIEEE Conference on Computer Vision and Pattern Recognition, 2015.
[47] Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar Rätsch, Sylvain Gelly, and Bernhard Schölkopf. Competitive training of mixtures of independent deep generative models. InWorkshop at the 6th International Conference on Learning Representations (ICLR), 2018.
[48] Francesco Locatello, Gabriele Abbati, Tom Rainforth, Stefan Bauer, Bernhard Schölkopf, and Olivier Bachem. On the fairness of disentangled representations. InAdvances in Neural Information Processing Systems, 2019.
[49] Francesco Locatello, Ben Poole, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem, and Michael Tschannen. Weakly-supervised disentanglement without compromises. InInternational Conference on Machine Learning, 2020a.
[50] Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar Rätsch, Bernhard Schölkopf, and Olivier Bachem. Disentangling factors of variation using few labels.International Conference on Learning Representations, 2020b.
[51] Michael F Mathieu, Junbo J Zhao, Aditya Ramesh, Pablo Sprechmann, and Yann LeCun. Disentangling factors of variation in deep representation using adversarial training. InAdvances in Neural Information Processing Systems, 2016.
[52] Edvard Munch. The scream, 1893.
[53] Ashvin V Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, and Sergey Levine. Visual reinforcement learning with imagined goals. InAdvances in Neural Information Processing Systems, 2018.
[54] Siddharth Narayanaswamy, T Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Noah Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. Learning disentangled representations with semi-supervised deep generative models. InAdvances in Neural Information Processing Systems, 2017.
[55] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence functionals and the likelihood ratio by convex risk minimization.IEEE Transactions on Information Theory, 56(11):5847-5861, 2010.
[56] Judea Pearl.Causality. Cambridge University Press, 2009.
[57] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research,
[58] Jonas Peters, Dominik Janzing, and Bernhard Schölkopf.Elements of causal inference: foundations and learning algorithms. MIT press, 2017.
[59] Scott Reed, Kihyuk Sohn, Yuting Zhang, and Honglak Lee. Learning to disentangle factors of variation with manifold interaction. InInternational Conference on Machine Learning, 2014.
[60] Scott Reed, Yi Zhang, Yuting Zhang, and Honglak Lee. Deep visual analogy-making. InAdvances in Neural Information Processing Systems, 2015.
[61] Karl Ridgeway and Michael C Mozer. Learning deep disentangled embeddings with the f-statistic loss. InAdvances in Neural Information Processing Systems, 2018.
[62] Michal Rolinek, Dominik Zietlow, and Georg Martius. Variational autoencoders recover pca directions (by accident). InProceedings IEEE Conf. on Computer Vision and Pattern Recognition, 2019.
[63] Jürgen Schmidhuber. Learning factorial codes by predictability minimization.Neural Computation, 4(6):863-879, 1992.
[64] Bernhard Schölkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. On causal and anticausal learning. InInternational Conference on Machine Learning, 2012.
[65] Rajen D Shah and Jonas Peters. The hardness of conditional independence testing and the generalised covariance measure.arXiv preprint arXiv:1804.07203, 2018.
[66] Rui Shu, Yining Chen, Abhishek Kumar, Stefano Ermon, and Ben Poole. Weakly supervised disentanglement with guarantees. InInternational Conference on Learning Representations, 2020.
[67] Peter Sorrenson, Carsten Rother, and Ullrich Köthe. Disentanglement by nonlinear ica with general incompressible-flow networks (gin). InInternational Conference on Learning Representations, 2020.
[68] P. Spirtes, C. Glymour, and R. Scheines.Causation, prediction, and search. MIT Press, 2000.
[69] Xander Steenbrugge, Sam Leroux, Tim Verbelen, and Bart Dhoedt. Improving generalization for abstract reasoning tasks using disentangled feature representations. InWorkshop on Relational Representation Learning at NeurIPS, 2018.
[70] Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density-ratio matching under the bregman divergence: a unified framework of density-ratio estimation.Annals of the Institute of Statistical Mathematics, 64(5):1009-1044, 2012.
[71] Raphael Suter, Djordje Miladinovi´c, Stefan Bauer, and Bernhard Schölkopf. Interventional robustness of deep latent variable models. InInternational Conference on Machine Learning, 2019.
[72] Valentin Thomas, Emmanuel Bengio, William Fedus, Jules Pondard, Philippe Beaudoin, Hugo Larochelle, Joelle Pineau, Doina Precup, and Yoshua Bengio. Disentangling the independently controllable factors of variation by interacting with the world.Learning Disentangled Representations Workshop at NeurIPS, 2017.
[73] Michael Tschannen, Olivier Bachem, and Mario Lucic. Recent advances in autoencoder-based representation learning.arXiv preprint arXiv:1812.05069, 2018.
[74] Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, and Olivier Bachem. Are disentangled representations helpful for abstract visual reasoning? InAdvances in Neural Information Processing Systems, 2019.
[75] Satosi Watanabe. Information theoretical analysis of multivariate correlation.IBM Journal of research and development, 4(1):66-82, 1960.
[76] William F Whitney, Michael Chang, Tejas Kulkarni, and Joshua B Tenenbaum. Understanding visual concepts with continuation learning.arXiv preprint arXiv:1602.06822, 2016.
[77] Jimei Yang, Scott E Reed, Ming-Hsuan Yang, and Honglak Lee. Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. InAdvances in Neural Information Processing Systems, 2015.
[78] Li Yingzhen and Stephan Mandt. Disentangled sequential autoencoder. InInternational Conference on Machine Learning, 2018a.
[79] Li Yingzhen and Stephan Mandt. Disentangled sequential autoencoder. InInternational Conference on Machine Learning, pages 5656-5665, 2018b.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.