UMIACS Computational Linguistics Colloquium Series, November 13, 1997

UMIACS Computational Linguistics Colloquium Series, November 13, 1997


COMBINING AND STANDARDIZING LARGE-SCALE, PRACTICAL ONTOLOGIES FOR MACHINE TRANSLATION AND OTHER USES


Eduard Hovy
Information Sciences Institute of the University of Southern California

Over the past few years, the availability of several large-scale symbol taxonomies and axiomatizations (often called Ontologies) has generated considerable interest, mostly for how they can be used to speed up the construction and enhance the robustness of knowledge-based AI systems such as planners, qualitative reasoners, database access managers, natural language engines, etc.

Since no Ontology is ever complete, an increasingly important question is whether the various efforts can be combined to form one larger, more general, Ontology.

I will outline the plans and work to date of recent attempts to create a single large Ontology for general free use over the Web. This Ontology may eventually combine the existing large-scale ontologies WordNet, CYC, MIKROKOSMOS, EDR (from Japan), SENSUS/Pangloss, and others, into a single framework. Under the aegis of the ANSI Ad Hoc Committee on Ontology Standards, and in portion funded under DARPA's HPKB effort, various sites have been loosely collaborating to further this goal. These sites include Stanford University, USC/ISI, CYCorp, IBM Santa Teresa, John Sowa, Nicola Guarino, and others.

I will describe the problems and joys of aligning various ontologies. The experiences of several researchers (notably, various people at ISI and at CYC) over the past two years are crystallizing out a partially automated process involving three major steps: term alignment, structural checking, and internal consistency enforcement.

In addition, I will mention other aspects of large-scale ontologies: how one can characterize them in a systematic way (and hence compare and possibly even evaluate them one day); and how building a large ontology for the explicit purpose of Machine Translation and automated text summarization involves specific design decisions, including principles of inclusion and taxonomization.

Project Homepage


Return to the UMD Computational Linguistics Colloquium Series.