×

VQA

swMATH ID: 36506
Software Authors: Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh
Description: VQA: Visual Question Answering. VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. 265,016 images (COCO and abstract scenes). At least 3 questions (5.4 questions on average) per image. 10 ground truth answers per question. 3 plausible (but likely incorrect) answers per question. Automatic evaluation metric.
Homepage: https://visualqa.org
Source Code:  https://github.com/GT-Vision-Lab/VQA
Keywords: arXiv_cs.CL; Computer Vision; Pattern Recognition; arXiv_cs.CV; VQA; Visual Question Answering
Related Software: CLEVR; YOLO; ImageNet; Grad-CAM; Adam; Flickr30K; CLEVR dataset; CIDEr; Caffe; ViLBERT; PyTorch; Im2Text; MS-COCO; BERT; DenseCap; GloVe; Faster R-CNN; Python; DeepProbLog; AQuA
Cited in: 6 Documents

Standard Articles

1 Publication describing the Software Year
VQA: Visual Question Answering
Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh
2015

Citations by Year