Abstract: An n end-to-end real-time scene text localization and recognition method is presented. The method localizes textual content in images, a video or a webcam stream and performs character recognition (OCR). The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs).
The ER detector is robust to blur, illumination, color and texture variation and handles low-contrast text. In the first classification stage, the probability of each ER being a character is estimated using novel features calculated
with O(1) complexity per region tested. Only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. A highly efficient exhaustive search with feedback loops is then applied to group ERs into words and to select the most probable character segmentation. Finally, text is recognized in an OCR stage trained using synthetic fonts.
The method was evaluated on two public datasets, where it achieves state-of-the-art results amongst published methods.