LAMP - Media Group - Research - Enhancement of Text in Video

About

People

Research

Publications

Seminars

Presentations

Courses

Video Segmentation

Overview

The processing of pre-edited video for inclusion into multi-media environments often requires the explicit identification of scenes'' or shots''. Identifying key-frames'' for each of these scenes allows the indexes to be built using these representatives.

In this project we are developing algorithms for video segmentation and key-frame identification. The algorithms work on MPEG or Motion-JPEG sequences, and exploit three types of information: the Discrete Cosine Transform (DCT) coefficients, the predictive block type, and the motion compensation coding (if available).

MPEG streams consist of three distinct types of frames - I,P, and B frames. The I frame is simply a JPEG encoding of its corresponding image. The P frame is predicted from its previous I or P frame. The B frame is predicted both from its previous and its next I or P frame. All three types of frames are ultimately encoded using the DCT. The difference is that in the I frames the DCT information is derived directly from the image samples whereas in the P and B frames the DCT information is derived from the residual error after prediction.

If the MPEG stream contains only I frames, pure DCT information is used in the representation, and no predictive or motion information is encoded. The current approach for these sequences involves computing a simple sum of the squares of the differences of the DCT coefficients, and looking for peaks in the series. By considering neighboring frames, as well as pairs which are slightly separated, we obtain a superset of the locations where scenes change. In case of Motion-JPEG, this turns out to be a fairly robust mechanism.

If the MPEG stream contain P and B frames between consecutive I frames, DCT information can then be used to narrow down the search space where scene changes could possibly have occured, and P and B frames are used to pinpoint changes. For each block in a P or B frame, a test is made to see if this $16\times 16$ block can be predicted from its corresponding block in the previous I/P frame with a possible offset to compensate for motion. If it appears that it would be more expensive to predict and encode from the previous I/P frame than to directly code (using JPEG techniques), then that particular block is not predicted but is Intra-coded. This usually occurs if the current macro-block does not have much in common with its previous one. Thus every P frame consists of either Intra-coded macro-blocks or Forward-predicted macro-blocks. If a P frame contains many Intra-coded blocks then there is a very high probability of a scene having occured somewhere between the previous I/P frame and the current P frame.

If a majority if the macro-blocks in a B frame are forward-predicted, then there is a distinct possibility that a scene charge will occur between the current frame and the next I/P frame. Similarly, if a majority of the macro-blocks in a B frame are backward-predicted then the probability of a scene change having occured between the previous I/P frame and the current frame is very high. Using the information from the P and B frames, the changes in scenes can be pinpointed. The analysis involves counting the number of forward-predicted macro-blocks in P frames, and counting numbers of forward-predicted and backward-predicted frames in each B frame. In the case of a B frame, each bi-directionally predicted macro-block gets counted as both a forward-predicted and backward-predicted, and finally each B frame is assigned the lesser of the two values. With the help of a plot of these values, distinct valleys in the plot can be extracted which indicate the frames where the scene changes occured.

One difficulty with using motion prediction is that large camera motion such as panning and zooming could lead to false detection. For this reason, we attempt to simultaneously detect pan and zoom based on the distribution of motion compensation vectors.