Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities

Paul Lerner 1, 2 Olivier Ferret 3 Camille Guinaudeau 1, 2 Hervé Le Borgne 3 Romaric Besançon 3 José G. Moreno 4, 5 Jesús Lovón Melgarejo 4, 5 
2 TLP - Traitement du Langage Parlé
LISN - Laboratoire Interdisciplinaire des Sciences du Numérique, STL - Sciences et Technologies des Langues
3 LVIC - Laboratoire Vision et Ingénierie des Contenus
DIASI - Département Intelligence Ambiante et Systèmes Interactifs : DRT/LIST/DIASI
4 IRIT-IRIS - Recherche d’Information et Synthèse d’Information
IRIT - Institut de recherche en informatique de Toulouse
Abstract : Whether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded in a visual context using a Knowledge Base (KB). To benchmark this task, called KVQAE (Knowledge-based Visual Question Answering about named Entities), we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and products). The dataset is annotated using a semi-automatic method. We also propose a KB composed of 1.5M Wikipedia articles paired with images. To set a baseline on the benchmark, we address KVQAE as a two-stage problem: Information Retrieval and Reading Comprehension, with both zero-and few-shot learning methods. The experiments empirically demonstrate the difficulty of the task, especially when questions are not about persons. This work paves the way for better multimodal entity representations and question answering. The dataset, KB, code, and semi-automatic annotation pipeline are freely available at https://github.com/PaulLerner/ViQuAE.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618
Contributor : Paul Lerner Connect in order to contact the contributor
Submitted on : Tuesday, April 26, 2022 - 9:59:38 AM
Last modification on : Tuesday, June 14, 2022 - 12:14:31 PM

File

lerner_sigir_2022_camera.pdf
Publisher files allowed on an open archive

Identifiers

Citation

Paul Lerner, Olivier Ferret, Camille Guinaudeau, Hervé Le Borgne, Romaric Besançon, et al.. ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities. 2022. ⟨hal-03650618⟩

Share

Metrics

Record views

51

Files downloads

56