Overview
The purpose of a document is to facilitate the transfer of information
from its author to its readers. It is the author's job to design
the document so that the information it contains can be interpreted
accurately and efficiently. To do this, the author can make use
of a set of stylistic tools. We have introduced the concept of
document functionality, which attempts to describe the roles of
documents and their components in the process of transferring
information. A functional description of a document provides insight
into the type of the document, into its intended uses, and into
strategies for automatic document interpretation and retrieval.
We
believe that there is a level of document organization, which
can be regarded as intermediate between the geometric and semantic
levels, that relates to the efficiency with which the document
transfers its information to the reader. This is what we call
the functional level.
In
order to effectively process a document, most document image understanding
systems rely on relatively specific information about a restricted
domain in order to accurately model the expected document class(es).
This allows the system to richly interpret the document, and extract
detailed information about its content. Unfortunately, for less
homogeneous environments this approach cannot be effectively applied.
As the set or stream of documents becomes more diverse (both intra-class
and inter-class), the formulation of models becomes more difficult.
Functional interpretation of documents can greatly facilitate
tasks associated with their classification and use. We have concentrated
on extracting functional features to provide a classification
based on how the document can be used.
|