ICDAR 2024 RDTAG

Introduction

The ability to comprehend text within everyday documents is a fundamental task that is necessary to complete numerous tasks humans undertake regularly in their daily lives. The ability to comprehend text is a key pathway to acquire knowledge over time. Similarly, for Contextual AI systems, there is a need to develop solutions to give machines the ability to read and infer from lifelong activities and develop skills. Life-long comprehension requires long term always on data capture, which introduces a number of challenges to the sensor configuration (cost, sensor, energy efficiency, how to wear, etc.) and the usage pattern (human pose, reading gestures, etc.). Combined with the task of text comprehension, these bring a unique challenge for the ICDAR community. We introduce this challenge with the help of Project Aria [1] from the Meta Reality Labs Research.

In addition to the design challenges discussed above, in wearable device document comprehension, further complexities arise from the variability of the human poses – ranging from inclined or resting positions to even moments of sleep – coupled with varying lighting conditions, encompassing sunlight, artificial lamps, or night-time settings. Additionally, potential obstacles such as occlusions due to page folding or other subject in the scene pose additional hurdles in obtaining accurate OCR. The diverse nature of document types further increases the intricacy and variability of the task at hand. Various categories ranging from textbooks to academic dissertations, newspapers to conference papers, and encyclopedias to biographies, each presenting a unique challenge in text recognition. The distinctive layouts, font styles, and content structures inherent in newspaper editorials, research periodicals, dictionaries, and others demand adaptable OCR systems that are capable of handling these different formats.

We believe that OCR will be a key technology piece that needs to be solved for EgoCentric Machine Perception. It brings its unique challenges, as discussed above. While we think that some of the sensor constraints would get relaxed over the coming years, challenges of egocentric viewpoints are here to stay. We believe that with Project Aria, we are at a point to begin the journey of EgoCentric OCR. With this context in mind, in this competition, we aim to introduce the task of low-resolution OCR on pages captured using wearable devices, focusing on the complex challenges posed by diverse document types and the complexities of varying human positions and lighting conditions. The following are the tasks that the competition would look into:

Task A: Isolated Word Recognition in Low Resolution

Task B: Prediction of Reading Order

Task C: Page Level Recognition and Reading

Prizes and Awards

For each task, we aim to designate a winner based on the evaluation process of the proposed system. Winners of Task A: Isolated Word Recognition in Low Resolution and Task B: Prediction of Reading Order will be awarded a cash prize of 300 USD each. The winner of Task C: Page Level Recognition and Reading will receive a cash prize of 1000 USD. However, prizes will only be awarded if the competition report is accepted for publication in ICDAR proceedings (call for competition). Acceptance of the competition report is contingent upon having sufficient participants to draw meaningful conclusions.

References

https://www.projectaria.com/.
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002).

ICDAR 2024 RDTAG

ICDAR 2024 Competition on Reading Documents Through Aria Glasses

Prizes and Awards

Competition Updates

Recents Updates

Introduction

Prizes and Awards

References