Dataset

Existing Mono Lingual Scene Text Datasets

IIIT 5K: IIIT5K
CUTE: CUTE
CTW : CTW
SCUT-CTW1500 : SCUT-CTW1500
Total-Text : Total-Text
ArT : ArT
COCO-Text : COCO-Text
ICDAR 2015 : ICDAR 2015
ICDAR 2013 : ICDAR 2013



Existing Multi Lingual Scene Text Datasets

IIIT-IndicSTR-Word: IIIT-IndicSTR-Word
Bharat Scene Text Dataset: Bharat ST
IndicSTR12: IndicSTR12
MLT-19: MLT-19
MLT-17: MLT-17
All these existing datasets can be used for pre-training or training purposes.



Competition Dataset: MLT-STDR-2025 Dataset


Sample Images

IMG IMG IMG


IMG IMG IMG

Fig.1 Presents examples of Indic multi lingual road side scene images.



Character List of Script/Language

Character set can be downloaded from these links Bengali Gujarati Hindi Kannada Malayalam Marathi Odia Punjabi Tamil Telugu Special Charlist .

Please note that the special character list provided is not exhaustive and may not include all characters present in word-images. It's common practice to designate a special character, not part of the target language script, as a "don't care" character. This helps the model ignore non-essential characters, which may improve accuracy by reducing the character set handled by the OCR model.



Validation Dataset

Coming Soon



Test Dataset

Coming Soon

Participants may use any other public datasets for training purposes. They must include the names of these additional datasets in their report.

The dataset is freely available for academic and research purposes.