×

Unsupervised discovery, control, and disentanglement of semantic attributes with applications to anomaly detection. (English) Zbl 1520.68154

Summary: Our work focuses on unsupervised and generative methods that address the following goals: (1) learning unsupervised generative representations that discover latent factors controlling image semantic attributes, (2) studying how this ability to control attributes formally relates to the issue of latent factor disentanglement, clarifying related but dissimilar concepts that had been confounded in the past, and (3) developing anomaly detection methods that leverage representations learned in the first goal. For goal 1, we propose a network architecture that exploits the combination of multiscale generative models with mutual information (MI) maximization. For goal 2, we derive an analytical result, lemma 1, that brings clarity to two related but distinct concepts: the ability of generative networks to control semantic attributes of images they generate, resulting from MI maximization, and the ability to disentangle latent space representations, obtained via total correlation minimization. More specifically, we demonstrate that maximizing semantic attribute control encourages disentanglement of latent factors. Using lemma 1 and adopting MI in our loss function, we then show empirically that for image generation tasks, the proposed approach exhibits superior performance as measured in the quality and disentanglement of the generated images when compared to other state-of-the-art methods, with quality assessed via the Fréchet inception distance (FID) and disentanglement via mutual information gap. For goal 3, we design several systems for anomaly detection exploiting representations learned in goal 1 and demonstrate their performance benefits when compared to state-of-the-art generative and discriminative algorithms. Our contributions in representation learning have potential applications in addressing other important problems in computer vision, such as bias and privacy in AI.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H35 Image analysis in multivariate analysis
68T45 Machine vision and scene understanding
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abay, R., Gehly, S., Balage, S., Brown, M., & Boyce, R. (2018). Maneuver detection of space objects using generative adversarial networks. Paper presented at the Advanced Maui Optical and Space Surveillance Technologies Conference. Google Scholar
[2] Akçay, S., Atapour-Abarghouei, A., & Breckon, T. P. (2018). GANomaly: Semi-supervised anomaly detection via adversarial training. In C. Jawahar, H. Li, G. Mori, & K. Schindler (Eds.), Lecture Notes in Computer Science: Vol. 11363. Computer Vision—ACCV 2018. Berlin: Springer. . Google Scholar
[3] Akçay, S., Atapour-Abarghouei, A., & Breckon, T. P. (2019). Skip-GANomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In Proceedings of the 2019 IEEE International Joint Conference on Neural Networks (pp. 1-8). Piscataway, NJ: IEEE. Google Scholar
[4] Bachman, P., Hjelm, R. D., & Buchwalter, W. (2019). Learning representations by maximizing mutual information across views. CoRR, abs/1906.00910.
[5] Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., & Steger, C. (2018). Improving unsupervised defect segmentation by applying structural similarity to autoencoders. CoRR, abs/1807.02011.
[6] Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. In Proceedings of the International Conference on Learning Representations. OpenReview. Google Scholar
[7] Burlina, P., Joshi, N., & Wang, I. (2019). Where’s Wally now? Deep generative and discriminative embeddings for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 11507-11516). Piscataway, NJ: IEEE. Google Scholar
[8] Chen, T. Q., Li, X., Grosse, R. B., & Duvenaud, D. K. (2018). Isolating sources of disentanglement in variational autoencoders. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems, 31 (pp. 2610-2620). Red Hook, NY: Curran. Google Scholar
[9] Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems, 29 (pp. 2172-2180). Red Hook, NY: Curran.
[10] Deecke, L., Vandermeulen, R., Ruff, L., Mandt, S., & Kloft, M. (2018). Image anomaly detection with generative adversarial networks. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Cham: Springer. Google Scholar
[11] Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121-134. Google Scholar
[12] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 2672-2680). Red Hook, NY: Curran.
[13] Gray, K., Smolyak, D., Badirli, S., & Mohler, G. (2020). Coupled IGMM-GANs for deep multimodal anomaly detection in human mobility data. ACM Transactions on Spatial Algorithms and Systems, 6(4), article 24. Google Scholar
[14] Grover, A., Dhar, M., & Ermon, S. (2018). Flow-GAN: Combining maximum likelihood and adversarial learning in generative models. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI. Google Scholar
[15] Harkonen, E., Hertzmann, A., Lehtinen, J., & Paris, S. (2020). GANSpace: Discovering interpretable GAN controls. CoRR, abs/2004.02546.
[16] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In I. Guyon, Y. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems, 30 (pp. 6626-6637). Red Hook, NY: Curran.
[17] Jain, N., Manikonda, L., Hernandez, A. O., Sengupta, S., & Kambhampati, S. (2018). Imagining an engineer: On GAN-based data augmentation perpetuating biases. CoRR, abs/1811.03751. Google Scholar
[18] Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANS for improved quality, stability, and variation. In Proceedings of the International Conference on Learning Representations. OpenReview. Google Scholar
[19] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE. Google Scholar
[20] Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE. Google Scholar
[21] Kimura, M., & Yanagihara, T. (2018). Semi-supervised anomaly detection using GANS for visual inspection in noisy training data. CoRR, abs/1807.01136.
[22] Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1 × 1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems, 31 (pp. 10215-10224). Red Hook, NY: Curran.
[23] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. CoRR, abs/1312.6114.
[24] Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3D object representations for fine-grained categorization. In Proceedings of the 4th International IEEE Workshop on 3D Representation and Recognition. Piscataway, NJ: IEEE. Google Scholar
[25] Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., & Aila, T. (2019). Improved precision and recall metric for assessing generative models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, & R. Garnett (Eds.), Advances in neural information processing systems, 32 (pp. 3927-3936). Red Hook, NY: Curran. Google Scholar
[26] Lai, Y., Hu, J., Tsai, Y., & Chiu, W. (2018). Industrial anomaly detection and one-class classification using generative adversarial networks. In Proceedings of the 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (pp. 1444-1449). Piscataway, NJ: IEEE. Google Scholar
[27] Lin, C. H., Chang, C., Chen, Y., Juan, D., Wei, W., & Chen, H. (2019). COCO-GAN: Generation by parts via conditional coordinating. In Proceedings of the IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE. Google Scholar
[28] Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88. Google Scholar
[29] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., & He, X. (2018). Generative adversarial active learning for unsupervised outlier detection. CoRR, abs/1809.10816.
[30] Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision. Piscataway, NJ: IEEE. Google Scholar
[31] Lucic, M., Kurach, K., Michalski, M., Gelly, S., & Bousquet, O. (2018). Are GANs created equal? A large-scale study. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems, 31. Red Hook, NY: Curran. Google Scholar
[32] Naphade, M., Chang, M.-C., Sharma, A., Anastasiu, D. C., Jagarlamudi, V., Chakraborty, P., … Siwei, L. (2018). The 2018 NVIDIA AI city challenge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (pp. 53-60). Piscataway, NJ: IEEE. Google Scholar
[33] Nie, W., Karras, T., Garg, A., Debnath, S., Patney, A., Patel, A. B., & Anandkumar, A. (2020). Semi-supervised StyleGAN for disentanglement learning. In Proceedings of the International Conference of Machine Learning. Google Scholar
[34] Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. CoRR, arXiv:1609.03499.
[35] Poole, B., Ozair, S., van den Oord, A., Alemi, A., & Tucker, G. (2019). On variational bounds of mutual information. In Proceedings of the 36th International Conference on Machine Learning. Google Scholar
[36] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115 (3), 211-252. doi:. Google Scholar
[37] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems, 29 (pp. 2234-2242). Red Hook, NY: Curran.
[38] Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U., & Langs, G. (2017). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging (pp. 146-157). Cham: Springer. Google Scholar
[39] Shen, Y., & Zhou, B. (2020). Closed-form factorization of latent semantics in GANS. arXiv:2007.06600.
[40] Tewari, A., Elgharib, M., Bharaj, G., Bernard, F., Seidel, H.-P., Pérez, P., … Theobalt, C. (2020). StyleRig: Rigging styleGAN for 3D control over portrait images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE. Google Scholar
[41] Zenati, H., Foo, C. S., Lecouat, B., Manek, G., & Chandrasekhar, V. R. (2018). Efficient GAN-based anomaly detection. CoRR, abs/1802.06222.
[42] Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019). Self-attention generative adversarial networks. In Proceedings of the 36th International Conference on Machine Learning (pp. 7354-7363). Google Scholar
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.