Advancing Indian Language OCR: Highlights from the ICVGIP 2025 Workshop
India’s linguistic diversity presents unique and complex challenges for Optical Character Recognition (OCR). With scripts spanning multiple writing systems, historical variations, degraded documents, and low-resource settings, building robust OCR systems for Indian languages remains an active and important research problem.
At ICVGIP 2025, a dedicated Workshop on Indian Language OCR brought together researchers, practitioners, and students to discuss recent advances, datasets, and open challenges in this space. The workshop served as a focused forum for exchanging ideas on building accurate, scalable, and inclusive OCR technologies for Indian languages.
CVIT researchers co-organized a focused workshop on Indian Language OCR: Current Status, Challenges, and Future Directions at ICVGIP 2025. This event served as a platform to:
- Bring together research on printed, handwritten, and scene text OCR for Indian languages.
- Present advances, datasets, tools, and applications to a broader audience.
- Engineering standards and APIs for interoperable OCR systems
- Identify open challenges and future research trajectories, especially for Indian scripts and multilingual AI tasks.
This workshop addresses the challenges and opportunities in developing Optical Character Recognition (OCR) systems for Indian languages, covering printed documents, handwritten manuscripts, and scene text. India’s rich linguistic diversity—spanning multiple scripts, writing styles, and document formats—poses unique research challenges that demand innovative and scalable solutions. The workshop presents recent advances in OCR methodologies, datasets, tools, and real-world applications, while also highlighting open problems and outlining future research directions. Robust multilingual OCR is a cornerstone of national digitization efforts in governance, education, and industry, and serves as a critical enabler for the broader success of Artificial Intelligence (AI) systems and Large Language Models (LLMs) in the Indian context.
Organisers from CVIT – IIITH included:
- C.V. Jawahar – Senior academic and adviser, shaping workshop content and research focus.
- Ajoy Mondal – played a key role in workshop leadership and student engagements.
- Gurupreet Singh Lehal and Ravi Kiran Sarvadevabhatla – supported workshop structuring and domain engagement.
This workshop actively connected researchers and students, fostering collaboration toward building robust, inclusive OCR solutions for Indian contexts.
Keynote Speaker: Dr. Ravi Kiran Sarvadevabhatla, Associate Professor at IIIT Hyderabad
Abstract of Talk:
BharatGen is a government-funded, mission-mode initiative focused on building sovereign, multimodal, and multilingual foundation models for India. This talk will provide a brief overview of the BharatGen initiative and present recent advances in document foundation models, including the open-source release of Patram-7B, India’s first vision–language document foundation model.
The talk outlined the capabilities enabled by such models for layout-aware and multilingual document understanding, and will describe the supporting tools developed for evaluation, visualization, and systematic error analysis. It also further discussed real-world applications currently under development and share insights gained from building practical vision–language systems, including scenarios where inputs extend beyond traditional document formats.
The keynote concluded by highlighting the transformative capabilities unlocked by document foundation models and by identifying open research challenges and future directions in multilingual and multimodal document intelligence.
The ICVGIP 2025 Workshop on Indian Language OCR successfully created a platform for meaningful dialogue on one of the most impactful problems in document analysis and recognition. As India continues to digitize its linguistic and cultural heritage, such focused efforts will be instrumental in ensuring that AI technologies remain inclusive, accurate, and accessible across languages and scripts.