LAMP - Media Group - Research - Enhancement of Text in Video

About

People

Research

Publications

Seminars

Presentations

Courses

Video Classification

Overview

An important aspect of video analysis is an ability to represent its high-level structure for classification, indexing and retrieval. Such a representation should contain, for example, information about interactions between shots, classification of shots, transitions between shots and classification of groups of shots based on activity.
We have previously developed a novel technique which reduces a sequence of MPEG-encoded video frames directly from the compressed domain to a trail of points in a low-dimensional space -- a VideoTrail. We use the DC coefficients of each frame as features to generate a low-dimensional point representing that frame, using a technique called FastMap for dimensionality reduction.

In the low-dimensional space, we cluster frames, analyze transitions between clusters and compute properties of the resulting trail efficiently. By classifying portions of the trail as either stationary or transitional, we are able to detect gradual edits between scenes. We split a VideoTrail by identifying regions in the sequence of points where we have stability in the video, and cutting in between them, thus providing a more robust analysis than traditional approaches which examine only local changes between frames. By tracking the interaction of clusters over time, we lay the groundwork for the complete analysis and representation of the video's physical and semantic structure.

The recent progress in this area has been on the classification of video trails, both into transitional and non-transitional classes, and as higher level patterns. We have recently begun looking at how HMMs can be used for classification.

Videos are a visual language. HMMs are the most successful tools in speech recognition, with levels of application ranging from phoneme recognition to content classification to automatic translation.

Similarly, in the analysis of a video stream, HMMs would be useful at several levels:

Classify transitions (fades, dissolves) from scenes.

Classify story elements: dialogue sequences, close-ups, outdoor scenery.

Classify video clips: News, advertising, weather, sports show, ...

Compared to an approach that would try to define rules for what a dissolve should look like compared to a camera pan, or how a news clip differs from a nature show, a statistical approach seems to provide a more powerful and more flexible framework.

Film makers express the structure of films with specific terms. These terms have been chosen to describe syntactic units in the story line. These syntactic units could correspond to states in the state transition network of a high-level HMM parser.

Related statistical methods can be used to answer the following question: What are the most discriminatory features that can be used to best classify N video clips among n categories. Answering this question is important because it will tell us what type of content detection we should focus on for maximum discrimination.