ICDAR 2025 Competition on Indic Handwritten Document Recognition

Competition Updates

Competition Website : December 31, 2024

Registration Opens : December 31, 2024

Validation Set and Test Set Release : April 15, 2024

Results Submission Deadline : May 10, 2025

Close Date of Competition : May 10, 2025

Winner Announcement : May 15, 2025

Recents Updates

Competition Website : Decedmber 31, 2024

Validation Set and Test Set Release : April 15, 2024

Motivation and Relevance to ICDAR Community

Handwritten OCR technology has become increasingly practical, supported by a variety of commercial APIs and applications. However, current OCR solutions predominantly focus on English or Latin scripts, with limited support for a few other scripts. Expanding these capabilities to cover more languages, including Indian languages, requires dedicated datasets and collaborative knowledge sharing. Even for English, publicly available data remains scarce, hindering academic progress compared to industry advancements. While comprehensive recognition and understanding of handwritten documents remains challenging, modern document understanding systems are increasingly able to fulfill user needs through targeted Question Answering (QA) tasks, as seen in initiatives like Document Question Answering (DocVQA). Now may be the ideal time to advance information extraction from handwritten documents, addressing the persistent complexities of recognition, layout interpretation, and content structure.

Handwritten text recognition presents distinct challenges, driven by several factors: (i) Style Variability - handwriting styles vary widely, making it complex to design algorithms that can robustly handle diverse forms of writing; (ii) Content Variability - handwritten content ranges from formal text to casual notes, requiring adaptable recognition models to interpret varying content types; and (iii) Temporal Changes - handwriting styles evolve over time, adding another layer of complexity as models must adapt to these changes. These challenges fuelongoing research in this dynamic field, where OCR bridges visual and machine readable domains by tackling the intricate task of recognizing handwritten text Researchers’ continued efforts push OCR’s boundaries, advancing possibilities in this interdisciplinary and demanding area.

The ultimate goal is to achieve accurate understanding of handwritten documents, regardless of source, quality, script, or other difficult conditions. This ambition has driven numerous research efforts, with past competitions like ICFHR 2022 IHTR and ICDAR 2023 IHTR, and ICDAR 2024 HWD providing valuable datasets to advance handwritten text recognition. These initiatives have set benchmarks for developing robust solutions in handwritten document understanding, inspiring researchers to tackle the unique complexities of handwriting. In this competition, we aim to advance this work even further by addressing the intricate challenges of Indic handwritten document understanding. This includes creating benchmarks that can handle diverse scripts, languages, and document structures found in Indic handwritten documents. By focusing on these complex aspects, we hope to facilitate the development of OCR systems that are not only accurate but also adaptable across a wide range of use cases and linguistic contexts.

While existing datasets support handwritten text recognition at word, line, and page levels, our proposed dataset introduces several novel elements designed to enhance the scope and adaptability of handwriting recognition. This dataset comprises full handwritten documents in ten Indic languages — Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu — across 10,000 document images and approximately 4,00,000 words contributed by approximately 1000 writers. It offers a broader linguistic range, higher word count, greater writer diversity, varied writing conditions, multiple imaging methods, and detailed ground truth annotations. With approximately 100 contributors per language, each writer provides handwritten paragraphs on A4-sized white paper, unrestricted in writing style, reflecting a wide array of per- sonal handwriting nuances. The documents are captured via mobile cameras instead of flatbed scanners to emulate real-world conditions. Capturing these handwritten documents using a mobile camera under unconstrained settings presents numerous challenges, including blurred text, text with overexposed, perspective text, variation in illumination, unwanted extensive background, low-resolution text, text under shadow, oriented text, and others. This dataset thus serves as a comprehensive resource for evaluating Indic handwritten document understanding models.

This dataset addresses the community’s urgent need for advanced, robust Indic handwritten document understanding and offers a unique resource to benefit the ICDAR and broader research communities. Researchers in this field are actively developing techniques to interpret complex handwritten documents, and this dataset — rich with diverse linguistic and visual challenges — aims to accelerate progress in these critical areas. Ultimately, it provides a valuable asset for researchers focused on solving real-world Indic handwritten document understanding in uncontrolled settings, fueling advancements across both academic research and applied computer vision solutions.

References

  1. Mondal, A., Jawahar, C.V.: Unconstrained camera captured indic offline handwritten dataset. In: ICPR (2024)
  2. Mondal, A., Jawahar, C.: Icdar 2023 competition on indic handwriting text recognition. In: ICDAR (2023)
  3. Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Towards spotting and recognition of handwritten words in Indic scripts. In: ICFHR (2018)
  4. Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: DAS (2018)
  5. Gongidi, S., Jawahar, C.: IIIT-INDIC-HW-WORDs: A dataset for Indic handwritten text recognition. In: ICDAR (2021)
  6. Jayadevan, R., Kolhe, S.R., Patil, P.M., Pal, U.: Database development and recognition of handwritten devanagari legal amount words. In: ICDAR (2011)
  7. Mondal, A., Mahadevan, V., Manmatha, R., Jawahar, C.: Icdar 2024 competition on recognition and VQA on handwritten documents. In: ICDAR (2024)