In
this presentation, we report on a new topic classification algorithm
that can take either text or speech as input and gives an ordered
list of topics for each input segment. The new algorithm represents
a paradigm shift in the way the problem is approached. In contrast
to previous approaches that were able to choose only one out of
a closed set of tens of topics, the new approach can choose several
topics for each story from a large open set of topics. Furthermore,
the new approach is automatically trainable from annotated data.
We will report on our initial studies using transcriptions of
broadcast news, with thousands of general and specific topics.
The new algorithm has clear implications for the general problem
of information retrieval.
|