Motivation and Relevance to ICDAR Community
Text detection and recognition in natural environments is essential for numerous applications, from digitizing business cards to indexing storefronts on city streets. This competition seeks to evaluate the capability of state-of-the-art methods in detecting and recognizing multi-lingual text. In diverse urban settings, people interact with a mix of languages and scripts, creating a unique challenge for recognition systems that cannot rely on extensive prior knowledge. Multi-lingual text recognition is also critical for analyzing online content streams, where a blend of languages and scripts frequently appear. By addressing these real-world scenarios, the competition aims to push advancements in handling the complex requirements of multi-lingual text recognition across varied contexts.
This competition seeks to address a crucial question: can current text detection and recognition methods - whether deep learning based or otherwise - handle multiple scripts and languages without significant modifications, or is there a need for script-specific adaptations? The ultimate objective is to accurately interpret text in any captured image, irrespective of source, quality, script, or other challenging conditions. Numerous research efforts have been directed toward overcoming these hurdles, with past robust reading competition (RRC) competitions and other studies offering valuable datasets for researchers to enhance text recognition in natural scenes.
In this competition, we aim to push the boundaries further by introducing the complex challenge of multi-lingual text detection, recognition, and language identification in a single framework. This task requires systems to operate effectively across scripts, recognizing text in varying languages with consistent accuracy and robustness. By doing so, we hope to catalyze advancements in developing versatile models that don’t rely on script-specific tuning, fostering the development of truly universal text recognition solutions capable of operating seamlessly in diverse real-world settings.
Although existing datasets support scene text detection and script identification, our proposed dataset introduces several unique elements. This dataset comprises full scene images from 11 languages - Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu. It unifies text detection, recognition, and language identification tasks within a single framework, offering a comprehensive resource for advancing multi-lingual scene text recognition. Each image in the dataset features incidental text with a range of complex characteristics such as occlusion, partial visibility, varied illumination, motion blur, perspective distortion, and differing orientations. Additionally, the text styles, sizes, colors, and scripts vary widely, accurately reflecting the challenges of real-world scenes. These variations in visual and linguistic properties pose a rigorous challenge, demanding robust algorithms that can generalize across scripts and languages. This dataset serves as a valuable tool for developing and benchmarking methods that can overcome these hurdles and achieve accurate, reliable text recognition in multi-lingual contexts.
This dataset is designed to meet the community’s pressing need for advanced, robust scene text detection and recognition, offering a unique resource that benefits both the ICDAR and broader computer vision research communities. Researchers in these fields are actively engaged in developing techniques to analyze complex scenes, detect and recognize text, assess the quality of text images, and identify scripts. By presenting a comprehensive multi-lingual dataset that incorporates diverse linguistic and visual challenges, this resource aims to drive progress across these interconnected areas. Its scope extends beyond traditional text recognition, providing ample material to test and refine algorithms that must handle occlusion, diverse scripts, various orientations, and environmental conditions that affect text quality, such as lighting and blur. Ultimately, this dataset serves as a valuable asset for researchers dedicated to solving real-world text detection and recognition problems in uncontrolled environments, fueling advancements in both academic research and applied computer vision solutions.
References
- Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E., et al.: ICDAR 2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: International Conference on Document Analysis and Recognition (ICDAR). pp. 1571–1576 (2019).
- Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 13th international conference on document analysis and recognition (ICDAR). pp. 1156–1160 (2015).
- Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 12th international conference on document analysis and recognition. pp. 1484–1493 (2013).
- Lunia, H., Mondal, A., Jawahar, C.: Indicstr12: A dataset for indic scene text recognition. In: International Conference on Document Analysis and Recognition. pp. 233–250 (2023).
- Mondal, A., Tulsyan, K., Jawahar, C.: Indic scene text on the roadside. In: International Conference on Document Analysis and Recognition. pp. 263–278 (2024).
- Nayef, N., Patel, Y., Busta, M., Chowdhury, P.N., Karatzas, D., Khlif, W., Matas, J., Pal, U., Burie, J.C., Liu, C.l., et al.: ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—rrc-mlt-2019. In: International conference on document analysis and recognition (ICDAR). pp. 1582–1587 (2019).
- Sun, Y., Ni, Z., Chng, C.K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: International Conference on Document Analysis and Recognition (ICDAR). pp. 1557–1562 (2019).