LAMP - Media Group - Research - Enhancement of Text in Video

About

People

Research

Publications

Seminars

Presentations

Courses

Text Based Indexing of Video Key-frames

Overview

Unlike most retrieval problems which deal with clean full text, video indexing based on automatically extracted text has several interesting challenges. First, text in digital video is usually of poor quality, so most of commercial OCR software have significant difficulty in recognizing the text accurately. We expect to find missing or incorrect characters or even words and as a result, exact matches between words will not always be possible. We need to use an approximate word matching algorithm instead of exact word matching. For example, if the user submits "house" as a query, then we may wish the word "hose" in the database to be considered as a match. Certainly, we can consider adding weights on matching scores. For example, for an exact match, we give a higher score and make the weight inversely proportional to the number of errors.
The second challenge is that text in digital videos is usually very terse and may lack semantic breadth. Methods that deal with semantic indexing, such as LSI(Latent Semantic Indexing), need considerable training data with similar characteristics. Intuitively, constructing a semantic dictionary might be a useful way, but tremendous amount of work is necessary to make it practical. If we consider a limited domain of video types, such as news, finance and sports, which often contain text, it is possible for us to build such a kind of "local" semantic dictionary. For example, if the user submits "Financial" as a query, then the video frames containing "Stock" or "Quotes" can be returned if we build them into the semantic dictionary.

Glimpse and WordNet are being integrated to produce a retrieval mechanism for key-frames.