LAMP - Language Group

LAMP Seminar
Language and Media Processing Laboratory
Conference Room 4406
A.V. Williams Building
University of Maryland

OCTOBER 12, 1999, 2:00
Dr. Michael Irani

Weizmann Institute of Science, Israel
Multi-Frame Analysis of Information in Video

ABSTRACT

Video is a very rich source of information. It provides *continuous* coverage of scenes over an *extended* region both in time and in space. That is what makes video more interesting than a plain collection images of the same scene taken from different views. Yet most video analysis algorithms do not take advantage of the full power of video data, and usually use information only from a few *discrete* frames or points at any given time. In this talk I will describe some aspects of our research on multiple-frame video analysis that aims to take advantage of both the continuous acquisition and extended spatio-temporal coverage of video data. First, I will describe a new approach to estimating image correspondences simultaneously across multiple frames. We show that for static (or "rigid") scenes, the correspondences of multiple points across multiple frames lie in low-dimensional subspaces. We use these subspace constraints to *constrain* the correspondence estimation process itself. These subspace constraints are geometrically meaningful and are not violated at depth discontinuities nor when the camera motion changes abruptly. This enables us to obtain dense correspondences *without* using heuristics such as spatial or temporal smoothness assumptions. Our approach applies to a variety of imaging models, world models, and motion models, yet does *not* require prior model selection, nor does it involve recovery of any 3D information. The spatio-temporal scene information contained in video data is distributed across many video frames and is highly redundant by nature. Accurate knowledge of both the continuous as well as the extended spatio-temporal data redundancy can be powerfully exploited to integrate scene information that is distributed across many video frames into compact and coherent scene-based representations. These representations can be very efficiently used to view, browse or index into, annotate, edit and enhance the video data. In the second part of my talk, I will show some demonstrations of video applications, which exploit both the continuous acquisition and the extended coverage of video data. In particular, I will show a live *interactive* demonstration of indexing, browsing, and manipulation of video data, as well as video editing and video enhancement applications.