Call for Indic Dataset for Handwritten Printed and Scene Text Images

India is a linguistically diverse country with a multitude of languages spoken across the nation. However, it officially recognizes 22 languages as Scheduled Languages, which are considered the formal languages of India. These languages hold special status for various administrative and educational purposes. Here is a list of the 22 officially recognized languages of India: Assamese , Bengali , Bodo, Dogri, Gujarati, Hindi, Kannada , Kashmiri , Konkani , Maithili , Malayalam , Manipuri , Marathi, Nepali, Odia , Punjabi, Sanskrit , Santali , Sindhi , Tamil, Telugu, Urdu

The 13 major Indian Languages are Assamese , Bengali, Hindi, Kannada , Malayalam , Manipuri , Marathi, Nepali, Odia , Punjabi, Tamil, Telugu and Urdu . Our Team is building an Indic language dataset for research purposes along with MeitY , Bhashini for Handwritten , Printed and Scene Text images which includes books, newspapers, online articles, government documents , handwritten samples and real time scene board images . We also consider partnering with libraries, educational institutions, and language enthusiasts who are interested in OCRing their resources for future use .

  • Printed , Handwritten and Scene text require different collection methods . Printed documents need to be scanned , Handwritten samples can be collected or use image recognition for scene text .
  • Ethical Considerations: Contributors may upload documents with no copyright issues and also obtain consent when necessary.

For Feedback and Support to the contributors and users of the dataset contact ( nltmocriiith@gmail.com )