Document
structures can provide great deal of information in determining
which documents are relevant to a given query. Structure-based
matching enhances existing content-based matching and information
retrieval capabilities, and provide an effective way to quickly
reduce candidate documents for similarity matching using document
structure knowledge.
I will present underlying techniques for structure-based retrieval,
such as document structure analysis, representation, similarity
matching, and classification, described in the following papers.
Xiaolong
Hao, Jason T.L. Wang, Michael P. Bieber, Peter A. Ng, "Heuristic
Classification of Office Documents", International Journal
on Artificial Intelligence Tools, pages 233-265, 1995.
Andreas
Dengel, Frank Dubiel, "Clustering and Classification of Document
Structure - a Machine Learning Approach", Third International
Conference on Document Analysis and Recognition, Montreal, Canada,
pages 587-591, Aug. 1995.
Other
related papers:
Xiaolong
Hao, Jason T. L. Wang, Peter A. Ng, "Nested Segmentation:
An Approach for Layout Analysis in Document Classification",
Second International Conference on Document Analysis and Recognition,
Tokyo, Japan, pages 319-322, Oct. 1993.
Andreas
Dengel, "Initial Learning of Document Structure", Second
International Conference on Document Analysis and Recognition,
Tokyo, Japan, pages 86-90, Oct. 1993.
|