Akshara-Assamese

Language

Assamese

Modality

Printed

Details Description

We randomly crop word images from 1000 document page images to create the Akshara-Assamese dataset. The pages are taken from multiple books and scanned using a flatbed scanner. The pages are scanned in 300 DPI. We manually annotate ground truth transcriptions of cropped word images. This dataset consists of 99,775 word images, and their corresponding ground truth transcriptions. We divide this dataset into Training, Validation, and Test Sets consisting of 79,697, 9,932, and 10,146 word images and their corresponding ground truth transcriptions. There are 20,133 unique Assamese words in the training set.

Training Set:

train.zip contains folder named “images” with 79,697 word level images, “train_gt.txt” file containing image name and ground truth transcription separated by “Tab space”, and “vocabulary.txt” contains list of 20,133 unique words in the Training set.

Validation Set:

val.zip contains folder named “images” with 9,932 word level images, and “val_gt.txt” containing image name and ground truth text separated by “Tab space”.

Test Set:

test.zip contains folder named “images” with 10,146 word level images, and “test_gt.txt” containing image name and ground truth text separated by “Tab space”.

Downloads

To download Train, Test or Val data, please Login

Login Sign Up

Sample Word Level Images from Training Set

Image Ground Truth
পাতি
কৰিলে,-
বুলি
কথা
নাই
কিবা
খটাই
হাতত
বিশ্বাস
আৰু
খিৰিকীৰ
দাম্ভিক-
দেখিম
বান্ধি
মচি
মাতৃ
লাহে
ভবাই
চেকুৰাই
ফোঁট
হৈ
ভাষাৰে
...আমাৰ
এৰি
হয়
?"
থূপ
আকৌ
!
যোৰহাট
হাড়-মূৰবোৰ
ভবা-চিন্তা
সোণৰ
ভাল
ব'ব
আহিবলৈ
কাইলৈ
"বুঢ়ী-মেছত
খায়
কি
নামি
শীতলতা
সপ্রশ্ন
জয়া,
ইয়াৰ
তোমাৰ
বিশ্বাস
গাভৰুৰ
নিদি
।......
তাৰ
ৰাতিৰ
কৰা
অহা
অলপ
নতুন
ব্রজমাঈৰ
যোগসূত্রক
যায়
এই
"ভালেই
আজিকালি
হঠাতে
গতাই
এডাল
বীণ
তাত
মলয়ে
স্বৰগত
বাজি
ৰাৱ
নমো
কৰ
জ্বলয়
।"
ধৰি
আমাৰ
২৮৪
বিনাশে
অৱতাৰ
পৰম
স্বপ্ন-সম
যত
বণিয়া
নিস্তাৰি
পৰিচা
হৃষীকেশ
ছিণ্ডিয়া
সর্ৰ্ব্বথা

License

This dataset is under the license CC BY 4.0. For more details, please see the data_license.doc file.

Feedback form