UMIACS Computational Linguistics Colloquium Series, December 11, 1997

UMIACS Computational Linguistics Colloquium Series, December 18, 1997


Generating Natural Language Briefings from Multiple On-Line Sources


Dragomir R. Radev
Department of Computer Science
Columbia University
http://www.cs.columbia.edu/~radev

Users of Internet-based news sites face several major problems when trying to follow a political event over time. First, news sources produce far more articles than a human can read. Second, different news sources may provide conflicting or complementary accounts of a specific event. Third, more recent pieces of news typically include redundant information which has already been shown to the user.

While a mixture of traditional information retrieval and sentence extraction methods can somewhat alleviate the effect of the first problem, no techniques exist that can combine conceptual information from multiple articles and present commonalities and disagreements between sources in a concise, yet fluent form.

As part of my thesis work, I have developed a methodology for summarization of news about political events in the form of natural language briefings that include appropriate background (historical) information, extracted from multiple sources. The system that I have created, SUMMONS, uses the the output of systems developed for the DARPA Message Understanding Conferences in the domain of terrorism to generate summaries of multiple documents on the same event over time. The proposed summarization methodology is used in presenting similarities and differences, contradictions and generalizations among sources of information.

In the talk, I will describe a set of linguistic techniques employed by SUMMONS and show how conceptual information from multiple articles is extracted, combined, organized into a paragraph by means of planning operators, and finally, realized using text generation in the form of a natural language summary. A feature of my work is the extraction and semantic categorization of descriptions of entities (people, places, and organizations) and their transformation-based reuse in order to provide historical context in the automatically generated briefings.


Return to the UMD Computational Linguistics Colloquium Series.