×

VQA

swMATH ID: 36506
Software Authors:
Description: VQA: Visual Question Answering. VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. 265,016 images (COCO and abstract scenes). At least 3 questions (5.4 questions on average) per image. 10 ground truth answers per question. 3 plausible (but likely incorrect) answers per question. Automatic evaluation metric.
Homepage: https://visualqa.org
Source Code:  https://github.com/GT-Vision-Lab/VQA
Keywords: arXiv_cs.CL; Computer Vision; Pattern Recognition; arXiv_cs.CV; VQA; Visual Question Answering
Related Software: BERT; Flickr30K; Faster R-CNN; MS-COCO; Adam; Im2Text; ImageNet; CLEVR; GitHub; Tensor2Tensor; LXMERT; GPT-3; BLEU; CIDEr; Visual7W; GloVe; CLEVR dataset; YOLO; Grad-CAM; SSD
Cited in: 10 Documents

Standard Articles

1 Publication describing the Software Year
VQA: Visual Question Answering arXiv
Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh
2015

Citations by Year