VQAonBD 2023

Introduction

Visual question answering generally aims to answer a query described in natural language, taking cues from the document image as the only input. As a part of this competition, we propose a visual question answering a dataset and baseline model from business document images. While a lot of work has already been done in the broader of this space, the questions from business documents present many niche challenges that may require cross-document referencing, additional numeric computations over the simple search query to reach the final solution, and so on. Further, since most business documents are usually presented in a tabular format, it may be non-trivial to leverage this structural conformity to answer more challenging queries. Given the unique nature of the problem, its tremendous prospect in the industry, layers of challenges to be tackled, and the recent surge of interest in the broader space of visual question answering, we believe this problem would interest the research community worldwide and attract good participation.

References

Baviskar, D., Ahirrao, S., Kotecha, K.: Multi-layout unstructured invoice documents dataset: A dataset for template-free invoice processing and its evaluation using AI approaches. IEEE Access 9, 101494–101512 (2021).
Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Mathew, M., Jawahar, C.,Valveny, E., Karatzas, D.: Icdar 2019 competition on scene text visual question an swering. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1563–1570. IEEE (2019).
Tito, R., Karatzas, D., Valveny, E.: Document collection visual question answering. In: International Conference on Document Analysis and Recognition. pp. 778–792. Springer (2021).
Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 697–706 (2021)

VQAonBD 2023

ICDAR 2023 Competition on Visual Question Answering on Business Document Images

Competition Updates

Recents Updates

Introduction

References