65

Datasets

305

APIs

03

Application

16

Publications

About Us

Vision

To harness Natural Language Technologies to overcome the language barrier through the use of a public digital platform, achieving the goal of ‘One India’ in the digital landscape and contribute towards ‘Atmanirbhar Bharat’.

Mission

To foster a knowledge-based society that transcends language barriers, the Mission led by the Ministry of Electronics and Information Technology (MeitY) endeavors to provide seamless access to content and services for all citizens in their native languages. Through the integration of cutting-edge technology, our mission is to empower individuals to communicate, learn, and access information effortlessly, fostering inclusivity and connectivity across the diverse linguistic landscape of India in this digital era.

Objectives

To build a high-quality technology to enable the development of applications and open up opportunities that use Indian language OCRs.

Optical Character Recognition, is a technology that enables the conversion of different types of documents, such as typed, handwritten, or printed text, into machine-encoded text. OCR technology is widely used to digitize physical documents, making them more accessible and searchable in digital formats. It finds applications in various fields, including document management, data entry, and accessibility services and is essential for efficient information retrieval and processing.

For Printed OCR : Develop robust recognizers that can recognize printed text in scanned documents meeting the accuracy goals for 22 Indic Languages​ (Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Oriya, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, and Urdu).

For Handwritten OCR : Develop robust recognizers that can recognize handwritten text (offline) meeting the accuracy goals for the 13 Indic languages (Hindi, Marathi, Tamil, Telugu, Kannada, Malayalam. Oriya, Punjabi, Gujarati, Assamese, Bangla, Urdu, Manipuri).

For Scene Text OCR : Create resilient recognition systems capable of automatically extracting text from images or scenes captured by cameras or other imaging devices, while meeting accuracy benchmarks for the 13 Indic languages (Hindi, Marathi, Tamil, Telugu, Kannada, Malayalam, Oriya, Punjabi, Gujarati, Assamese, Bangla, Urdu, Manipuri).

Implementing Agencies

NLTM (Natural Language Translation Mission) OCR is being carried out by a consortium of academic and institutional partners. Each partner focuses on a distinct set of language technologies and domains. The following are the consortium members IIIT Hyderabad, IIT Delhi, IIT Jodhpur, IIT Bombay, CDAC Noida and Punjabi University, Patiala.

Services

To create and nurture an ecosystem involving start-ups, central/state government agencies worked together to develop and deploy innovative products and services in Indian languages.

The Mission envisages to support the industry, especially the start-ups in Indian language technology space by providing them the technical assistance and linguistic resources to enable them to develop new products and services in Indian languages.

To increase the content in Indian languages on internet substantially in the domains of public interest, particularly, governance-and-policy, science & technology, education, healthcare, agriculture, governance, law & justice, etc.