ICDAR 2024 Competition on Recognition and VQA on Handwritten Documents

Competition Updates

Registration Opens : January 20, 2024

Training Data Release : February 29, 2024

Updated Training Data : April 15, 2024 (Revised)

Test Data Release : May 1, 2024 (Revised)

Registration Close : May 10, 2024 (Revised)

Results Submission Deadline : May 15, 2024 (Revised)

Winner Announcement : May 25, 2024 (Revised)

Recents Updates

Registration Opens : January 20, 2024

Training Data Release : February 29, 2024

Test Data Release : May 1, 2024

Introduction

Handwritten OCRs have become increasingly practical with the availability of various commercial APIs, solutions, and applications. However, their widespread use is predominantly limited to English or Latin scripts, with only a few exceptions in specific Asian scripts. The challenge lies in the scarcity of datasets and the sharing of knowledge to extend these capabilities to additional languages, particularly Indian languages. Even for English, publicly accessible datasets are limited, causing academic research to fall behind advancements in industrial solutions. While achieving comprehensive recognition and understanding remains challenging, contemporary document understanding systems have evolved to address user needs through Question Answering (QA) tasks. This approach has also been applied to Document Question Answering (DocVQA). The current juncture presents an opportunity to explore information extraction from handwritten documents, where the complexities of recognition and layout (both structure and content) pose ongoing challenges.

Handwritten text recognition poses unique challenges due to several factors. (i) Significant Style Variability - handwriting exhibits considerable style variability, making it a complex task to develop robust recognition algorithms that accommodate diverse writing forms. (ii) Content Variability - handwritten content spans a wide spectrum, ranging from formal text to informal notes. This variability demands adaptability in recognition models to interpret diverse content types effectively. (iii) Temporal Changes - handwriting may evolve, introducing an additional layer of complexity. Adapting to changes in an individual’s writing style necessitates continuous refinement of recognition models. The inherent challenges of handwritten text recognition serve as a potent motivator for researchers, sparking interest and driving exploration in this demanding and dynamic field. Inessence, OCR bridges the visual and machine-readable realms, explicitly focusing on the complexities of handwritten text recognition. The ongoing pursuit of solutions in this interdisciplinary and challenging domain reflects the resilience of researchers in pushing the boundaries of what OCR can achieve.

While handwritten text recognition has made significant strides for certain languages such as English [1,2,3], Chinese [4,5,6], Arabic [7,8], and Japanese [9,10], a considerable gap persists for many languages globally. Unfortunately, several Indian scripts and languages are under-represented in OCR research efforts, placing them at risk of being left behind in the technological landscape. Only a handful of the 22 languages spoken in India have received attention, primarily for communication purposes. The pressing need for research on text recognition for Indic scripts and languages cannot be overstated. Languages such as Hindi, Bengali, and Telugu, among the most spoken in India [11], urgently need OCR solutions tailored to their unique characteristics. Indic scripts pose specific challenges, making handwritten text recognition more demanding than Latin scripts. In most Indic scripts, forming conjunct characters, where two or more characters combine, is a common feature [12]. This complexity introduces intricacies not present in scripts like English. Compared to the relatively straightforward 52 unique characters (upper and lower case) in English, most Indic scripts boast over 100 unique basic Unicode characters [13]. This richness in character sets demands specialized attention in OCR systems to ensure accurate recognition.

In the previous editions of the competitions, ICFHR 2022 IHTR and ICDAR 2023 IHTR [14], we provided an existing training set, an existing test set, and a newly created test set. In ICFHR 2022 IHTR, eleven participants registered for the competition, and five teams submitted results. At ICDAR 2023 IHTR, eighteen teams registered for the competition, and eight presented results. Multiple participants used the newer methods based on state-of-the-art architectures. The proposed competition continues this effort with even more unique datasets and introduces two novel tasks: page level recognition and reading, and visual question answers on handwritten documents. Our challenge is centered around handwritten document recognition. This competition stands as a dynamic catalyst, igniting the passion and creativity of researchers to pioneer ground breaking solutions in the realm of handwritten document analysis. By providing a platform for innovation and algorithm design, it serves as a driving force, inspiring participants to push the boundaries of what is achievable in understanding and interpreting handwritten documents.

References

  1. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS (2008).
  2. Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: ICFHR (2014).
  3. Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: TROCR: Transformer-based optical character recognition with pre-trained models. arXiv (2021).
  4. Xie, Z., Sun, Z., Jin, L., Feng, Z., Zhang, S.: Fully convolutional recurrent network for handwritten Chinese text recognition. In: ICPR (2016).
  5. Wu, Y.C., Yin, F., Chen, Z., Liu, C.L.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: ICDAR (2017).
  6. Peng, D., Jin, L., Ma, W., Xie, C., Zhang, H., Zhu, S., Li, J.: Recognition of handwritten Chinese text by segmentation: A segment-annotation-free approach. IEEE Transactions on Multimedia (2022).
  7. Maalej, R., Kherallah, M.: Improving the DBLSTM for on-line Arabic handwriting recognition. Multimedia Tools and Applications (2020).
  8. Jemni, S.K., Ammar, S., Kessentini, Y.: Domain and writer adaptation of offline Arabic handwriting recognition using deep neural networks. Neural Computing and Applications (2022).
  9. Ly, N.T., Nguyen, C.T., Nakagawa, M.: Training an end-to-end model for offline handwritten Japanese text recognition by generated synthetic patterns. In: ICFHR (2018).
  10. Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: A semantic segmentation-based method for handwritten Japanese text recognition. In: ICFHR (2020).
  11. Krishnan, P., Jawahar, C.V.: HWNet v2: An efficient word image representation for handwritten documents. IJDAR (2019).
  12. Script Grammar for Indian languages (Accessed March 26 2020), http://language.worldofcomputing.net/grammar/script-grammar.html.
  13. Pal, U., Chaudhuri, B.: Indian script character recognition: a survey. Pattern Recognition (2004).
  14. Mondal, A., Jawahar, C.: Icdar 2023 competition on indic handwriting text recognition. In: International Conference on Document Analysis and Recognition. pp. 435–453. Springer (2023).
  15. Mathew, M., Karatzas, D., Jawahar, C.: Docvqa: A dataset for vqa on document images. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 2200–2209 (2021).
  16. Tito, R., Karatzas, D., Valveny, E.: Document collection visual question answering. In: 16th International Conference on Document Analysis and Recognition (ICDAR). pp. 778–792 (2021).
  17. Tanaka, R., Nishida, K., Yoshida, S.: Visualmrc: Machine reading comprehension on document images. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 13878–13888 (2021).
  18. Zhu, F., Lei, W., Feng, F., Wang, C., Zhang, H., Chua, T.S.: Towards complex document understanding by discrete reasoning. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 4857–4866 (2022).
  19. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002).