LAMP - Language Group

LAMP Seminar
Language and Media Processing Laboratory
Conference Room 4406
A.V. Williams Building
University of Maryland

A New Algorithm for the Classification of Multiple Simultaneous Topics

Richard Schwartz and John Makhoul
BBN Corporation
Cambridge MA

ABSTRACT

In this presentation, we report on a new topic classification algorithm that can take either text or speech as input and gives an ordered list of topics for each input segment. The new algorithm represents a paradigm shift in the way the problem is approached. In contrast to previous approaches that were able to choose only one out of a closed set of tens of topics, the new approach can choose several topics for each story from a large open set of topics. Furthermore, the new approach is automatically trainable from annotated data. We will report on our initial studies using transcriptions of broadcast news, with thousands of general and specific topics. The new algorithm has clear implications for the general problem of information retrieval.