Dataset
Existing Mono Lingual Scene Text Datasets
IIIT 5K: IIIT5K
CUTE: CUTE
CTW : CTW
SCUT-CTW1500 : SCUT-CTW1500
Total-Text : Total-Text
ArT : ArT
COCO-Text : COCO-Text
ICDAR 2015 : ICDAR 2015
ICDAR 2013 : ICDAR 2013
Existing Multi Lingual Scene Text Datasets
IIIT-IndicSTR-Word: IIIT-IndicSTR-Word
Bharat Scene Text Dataset: Bharat ST
IndicSTR12: IndicSTR12
MLT-19: MLT-19
MLT-17: MLT-17
All these existing datasets can be used for pre-training or training purposes.
Competition Dataset: MLT-STDR-2025 Dataset
Sample Images
Fig.1 Presents examples of Indic multi lingual road side scene images.
Character List of Script/Language
Character set can be downloaded from these links Bengali Gujarati Hindi Kannada Malayalam Marathi Odia Punjabi Tamil Telugu Special Charlist .
Please note that the special character list provided is not exhaustive and may not include all characters present in word-images. It's common practice to designate a special character, not part of the target language script, as a "don't care" character. This helps the model ignore non-essential characters, which may improve accuracy by reducing the character set handled by the OCR model.
Validation Dataset
Validation sets for all these four tasks can be downloaded from these links Validation . The validation set comprises scene text images (e.g., 07102023_L_GH028885_image_000150.png) and their corresponding ground truth files (e.g., 07102023_L_GH028885_image_000150.json). Each ground truth file provides detailed annotations for every text word in the image, including the following fields: Points: x, y coordinates outlining the polygon around the text. Language: The language of the word. Transcription: The textual content of the word. Flag: Indicates whether the word is considered for processing; a flag value of 1 means the word will be used for all four tasks. Participants can use this validation set for fine-tuning their models.
Test Dataset
Test set for Task-A can be downloaded from these links Test set (Task-A) . The test set includes only scene text images (e.g., 07102023_L_GH021756_image_000187.png).
The output for Task-A should be saved in a file named after the corresponding image with a .txt extension (e.g., 07102023_L_GH021756_image_000187.txt). Each line in the file must represent a word's predicted polygon, formatted as a sequence of x and y coordinates separated by tabs (download one sample file for Task-A: sample file ). The coordinates outline the polygon that encloses the text word.
Test set for Task-B can be downloaded from these links Test set (Task-B) . The test set comprises word-level images (e.g., 07102023_L_GH021756_image_000187_1.png) and an accompanying text file (e.g., information.txt) that enumerates the names of all word-level images included in the set.
The output for Task-B should be saved in a file named information.txt. Each line in this file must contain the word-level image filename followed by the predicted language, separated by a tab (download one sample file for Task-B: sample file ).
Test set for Task-C can be downloaded from these links Test set (Task-C) . The test set includes only scene text images (e.g., 07102023_L_GH021756_image_000187.png).
The output for Task-B should be saved in a file named after the corresponding image, with a .txt extension (e.g., 07102023_L_GH021756_image_000187.txt). Each line in the file must represent a single word's predicted polygon and its associated language. Format: Each line should contain a sequence of x and y coordinates outlining the polygon around the text word, followed by the predicted language, with all values separated by tabs (download one sample file for Task-C: sample file ).
Test set for Task-D can be downloaded from these links Test set (Task-D) . The test set includes only scene text images (e.g., 07102023_L_GH021756_image_000187.png).
The output for Task-D should be saved in a file named after the corresponding image, with a .txt extension (e.g., 07102023_L_GH021756_image_000187.txt). Each line in the file must represent a single word's predicted polygon and textual transcription. Format: Each line should contain a sequence of x and y coordinates outlining the polygon around the text word, followed by the predicted text, with all values separated by tabs (download one sample file for Task-D: sample file ).
Participants may use any other public datasets for training purposes. They must include the names of these additional datasets in their report.
The dataset is freely available for academic and research purposes.