ICDAR 2023 Competition on Visual Question Answering on Business Document Images

Competition Updates

Registration Opens : January 01, 2023

Training Data Release : January 15, 2023

Validation Data Release : January 20, 2023

Test Data Release : March 10, 2023 March 15, 2023

Registration Close : March 20, 2023 April 02, 2023

Upload Brief Description of System/Algorithm/Network : March 30, 2023 April 10, 2023

Upload Results and Inference Code due : March 30, 2023 April 10, 2023

Winner Announcement : April 20, 2023

Recents Updates

Registration Opens : January 1, 2023

Training Data Release : January 15, 2023

Validation Data Release : January 20, 2023

Test Data Release : March 15, 2023

The leaderboard is up !!!

The final leaderboard is up; winner and runner-up are announced !!!

Introduction

Visual question answering generally aims to answer a query described in natural language, taking cues from the document image as the only input. As a part of this competition, we propose a visual question answering a dataset and baseline model from business document images. While a lot of work has already been done in the broader of this space, the questions from business documents present many niche challenges that may require cross-document referencing, additional numeric computations over the simple search query to reach the final solution, and so on. Further, since most business documents are usually presented in a tabular format, it may be non-trivial to leverage this structural conformity to answer more challenging queries. Given the unique nature of the problem, its tremendous prospect in the industry, layers of challenges to be tackled, and the recent surge of interest in the broader space of visual question answering, we believe this problem would interest the research community worldwide and attract good participation.

References

  1. Baviskar, D., Ahirrao, S., Kotecha, K.: Multi-layout unstructured invoice documents dataset: A dataset for template-free invoice processing and its evaluation using AI approaches. IEEE Access 9, 101494–101512 (2021).
  2. Biten, A.F., Tito, R., Mafla, A., Gomez, L., Rusinol, M., Mathew, M., Jawahar, C.,Valveny, E., Karatzas, D.: Icdar 2019 competition on scene text visual question an swering. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). pp. 1563–1570. IEEE (2019).
  3. Tito, R., Karatzas, D., Valveny, E.: Document collection visual question answering. In: International Conference on Document Analysis and Recognition. pp. 778–792. Springer (2021).
  4. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 697–706 (2021)