Dataset Details

Home
Dataset Details

Mozhi-Kannada

Language

Kannada

Modality

Printed

Details Description

We randomly crop word images from 1000 document page images to create the Mozhi-Kannada dataset. The pages are taken from multiple books and scanned using a flatbed scanner. The pages are scanned in 600 DPI. We manually annotate ground truth transcriptions of cropped word images. This dataset consists of 1,00,011 word images, and their corresponding ground truth transcriptions. We divide this dataset into Training, Validation, and Test Sets consisting of 80,085, 10,088, and 9,838 word images and their corresponding ground truth transcriptions. There are 35,680 unique Kannada words in the training set.

Training Set:

train.zip contains folder named “images” with 80,085 word level images, “train_gt.txt” file containing image name and ground truth transcription separated by “Tab space”, and “vocabulary.txt” contains list of 35,680 unique words in the Training set.

Validation Set:

val.zip contains folder named “images” with 10,088 word level images, and “val_gt.txt” containing image name and ground truth text separated by “Tab space”.

Test Set:

test.zip contains folder named “images” with 9838 level images, and “test_gt.txt” containing image name and ground truth text separated by “Tab space”.

Downloads

Train Test Val Logout

Sample Word Level Images from Training Set

Image	Ground Truth
	ದೇವಿಯನ್ನು
	ಮೇಣದ
	ನನ್ನನ್ನು
	ಮನೆಗೆ
	ಕೂಡಲೇ
	ಜೊತೆಗೆ
	ಸ್ವಯಂಸೇವಕನಾಗಿ
	ಹಾರಿದ
	ಪತ್ರದಲ್ಲೇ
	ಬಂಟರ
	ಬಿಟ್ಟುಬಂದ
	\|
	ಕೆಲಸದಲ್ಲಿ
	ಹಿಡಿಯಬೇಕು.
	ನೀಚಾಶ್ರಯಂ
	ನಮ್ರಾಸ್ತರವಃ
	೧.
	ಬಿ
	ಅಜದೆ
	ಹೊಕ್ಕುಳ
	ಧೃತರಾಷ್ಟ್ರನಿಗೆ
	ಇದು.
	ಎನ್ನುತ್ತಾನೆ.
	ನಾವು
	ಸಂತೋಷಪಟ್ಟು
	ತಿಳಿವಿಗನುಗುಣವಾಗಿ
	ಬಂಡವಾಳಿಗರನ್ನು
	ಬಾಯೊಳಗೆ
	ಶ್ರೀಲ
	ಬಂತೋ,
	ಭಕ್ತ:
	ಮೂಡಿಸುತ್ತವೆ.
	ಇಲ್ಲವೆ
	ವಾರದುದ್ದಕ್ಕೂ
	ಬೆರಗು,
	ಮದ
	ಬಾಲೆಯನ್ನೀಗ
	ದೈವವೆ!
	ತಪ್ಪಿತಸ್ಥ
	ಮೇಳದೊಡನೆ
	‘ತಿಂಗಳ್’
	ಸಮರ್ಪಕವಾಗಿ
	ಕಾಣಿಸತ್ತೆ.
	:
	ಮಹಾಸ್ವಾಮಿ
	ಸಾಧನೆಯನ್ನು
	ತೀವ್ರವಾಗಿ
	ಭಾಗವಾಗಿ
	ಹಣಕಾಸು
	ಕಬ್ಬಿಣ
	ಸಿಬ್ಬಂದಿಯ
	ಸಂಪರ್ಕಿಸಬಹುದು.
	ಈ
	ದಿಲ್ಲಿ,
	ವಿದೇಶೀ
	ನಾಲ್ಕು
	ಎಂಬ
	ಆವಾಹನೆ
	ಬುದ್ಧಿ
	ಕೆಲವು
	ಮತ್ತು
	ಪರಿಸಮಾಪ್ತಿ
	ಸಂವಿಧಾನದ
	ಸ್ವಾತಂತ್ರ್ಯಗಳಂತೆ
	ರೂಪಾಯಿಗಳು.
	ಸ್ಥಾನಗಳೆರಡೂ
	ಇದನ್ನು
	ಸದಸ್ಯರಾಗುವಂತಿಲ್ಲ
	ಇತರ
	ಸೇವಾ
	ಭಾರತ
	ಅವರ
	ಹೊಂದಿತು.
	ಇರುವುದರಿಂದ,
	ಬ್ರಿಟಿಷ್
	ತನ್ನ
	ಮುದ್ರಣದಲ್ಲಿ
	ಎಂದರ್ಥ
	ಸರಕುಗಳ
	vii)
	ಹಾಗೂ
	ಪೈಕಿ
	ಮತ್ತು
	ಯನ್ನು
	ಆದೇಶವನ್ನು
	ಮಾಡಿದ್ದರೆ;
	ಜನಸಂಖ್ಯಾ
	ಖಂಡ/ದೇಶ
	ಭಾಗಗಳನ್ನುಮಾಡಿ
	ಕೈಗಳಿಗೆ
	ವಾಣಿಗಳ
	ದೇಶದಲ್ಲಿ
	ಕುರುಚಲು
	ಮುಖ್ಯ.
	ಮತ್ತು
	ಭೂಮಿ,
	ಬಡಜನರ
	ನೆಲೆಸಿರುತ್ತವೆ.
	3ಬೊಬ್ಬಿಱಿದನಾ
	ಗ)