|
Overview
Unlike most retrieval problems which deal with clean full text,
video indexing based on automatically extracted text has several
interesting challenges. First, text in digital video is usually
of poor quality, so most of commercial OCR software have significant
difficulty in recognizing the text accurately. We expect to find
missing or incorrect characters or even words and as a result,
exact matches between words will not always be possible. We need
to use an approximate word matching algorithm instead of exact
word matching. For example, if the user submits "house"
as a query, then we may wish the word "hose" in the
database to be considered as a match. Certainly, we can consider
adding weights on matching scores. For example, for an exact match,
we give a higher score and make the weight inversely proportional
to the number of errors.
The second challenge is that text in digital videos is usually
very terse and may lack semantic breadth. Methods that deal with
semantic indexing, such as LSI(Latent Semantic Indexing), need
considerable training data with similar characteristics. Intuitively,
constructing a semantic dictionary might be a useful way, but
tremendous amount of work is necessary to make it practical. If
we consider a limited domain of video types, such as news, finance
and sports, which often contain text, it is possible for us to
build such a kind of "local" semantic dictionary. For
example, if the user submits "Financial" as a query,
then the video frames containing "Stock" or "Quotes"
can be returned if we build them into the semantic dictionary.
Glimpse
and WordNet are being integrated to produce a retrieval mechanism
for key-frames.
|