VQ-Diffusion swMATH ID: 43456 Software Authors: Gu, Shuyang; Chen, Dong; Bao, Jianmin; Wen, Fang; Zhang, Bo; Chen, Dongdong; Yuan, Lu; Guo, Baining Description: Vector Quantized Diffusion Model for Text-to-Image Synthesis. We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality. Homepage: https://arxiv.org/abs/2111.14822 Source Code: https://github.com/cientgu/vq-diffusion Dependencies: Python Related Software: CSGNet; ViT; Free2CAD; BERT; VideoBERT; GitHub; GraphSAINT; VQA; Flickr30K; MultiWOZ; KE-GAN; Habitat; ParlAI; Wasserstein GAN; CIPS-3D; Swin Transformer; Deceive D; MaskGIT; Caffe; MMGeneration Cited in: 1 Document Cited by 5 Authors 1 Quan, Weize 1 Wang, Hanxiao 1 Wang, Yiqun 1 Yan, Dongming 1 Zhao, Mingyang Cited in 1 Serial 1 Computer Aided Geometric Design Cited in 1 Field 1 Numerical analysis (65-XX) Citations by Year