LAMP - Media Group - Research - Enhancement of Text in Video

About

People

Research

Publications

Seminars

Presentations

Courses

Enhancement of Text in Video

Overview

Video images, at times, contain textual images which may be part of scenery or are annotated artificially. Scenic textual images consist of images such as a street sign, writing on a wall, or name on a uniform which was captured by a camera. The amount of distortion and variation on these images are large and contribute to the textural content of an image. Annotated images are, however, meant to be read and recognized correctly. The textual content is usually designed to augment the underlying story line which can be used for a number of video processing tasks. Recognition and understanding of the annotated text are beneficial for video segmentation, searching, and other high levels tasks.

Annotated images are usually derived from highly resolved textual images. They usually have constant foreground texture and high contrast with the background. After digitization, the resolution of the textual image is reduced beyond the abilities of Optical Character Recognition (OCR) to correctly recognize the characters. In digitization, anti-aliasing is performed to avoid frequency cross-over during sampling. The basis for this process is theorized by work done by Shannon and Nyquist. In essence, spatial resolution is substituted for pixel depth. While this process is more appealing to human visualization, OCR engines do not perform well since they are usually not designed for greyscale or color images.