|
Overview
Video images, at times, contain textual images which may be part
of scenery or are annotated artificially. Scenic textual images
consist of images such as a street sign, writing on a wall, or name
on a uniform which was captured by a camera. The amount of distortion
and variation on these images are large and contribute to the textural
content of an image. Annotated images are, however, meant to be
read and recognized correctly. The textual content is usually designed
to augment the underlying story line which can be used for a number
of video processing tasks. Recognition and understanding of the
annotated text are beneficial for video segmentation, searching,
and other high levels tasks.
Annotated images are usually derived from highly resolved textual
images. They usually have constant foreground texture and high contrast
with the background. After digitization, the resolution of the textual
image is reduced beyond the abilities of Optical Character Recognition
(OCR) to correctly recognize the characters. In digitization, anti-aliasing
is performed to avoid frequency cross-over during sampling. The
basis for this process is theorized by work done by Shannon and
Nyquist. In essence, spatial resolution is substituted for pixel
depth. While this process is more appealing to human visualization,
OCR engines do not perform well since they are usually not designed
for greyscale or color images.
|