LAMP Seminar
Language and Media Processing Laboratory
Conference Room 4406
A.V. Williams Building
University of Maryland

Document Structure Representation, Matching, and Classification

Document structures can provide great deal of information in determining which documents are relevant to a given query. Structure-based matching enhances existing content-based matching and information retrieval capabilities, and provide an effective way to quickly reduce candidate documents for similarity matching using document structure knowledge.
I will present underlying techniques for structure-based retrieval, such as document structure analysis, representation, similarity matching, and classification, described in the following papers.

Xiaolong Hao, Jason T.L. Wang, Michael P. Bieber, Peter A. Ng, "Heuristic Classification of Office Documents", International Journal on Artificial Intelligence Tools, pages 233-265, 1995.

Andreas Dengel, Frank Dubiel, "Clustering and Classification of Document Structure - a Machine Learning Approach", Third International Conference on Document Analysis and Recognition, Montreal, Canada, pages 587-591, Aug. 1995.

Other related papers:

Xiaolong Hao, Jason T. L. Wang, Peter A. Ng, "Nested Segmentation: An Approach for Layout Analysis in Document Classification", Second International Conference on Document Analysis and Recognition, Tokyo, Japan, pages 319-322, Oct. 1993.

Andreas Dengel, "Initial Learning of Document Structure", Second International Conference on Document Analysis and Recognition, Tokyo, Japan, pages 86-90, Oct. 1993.




home | language group | media group | sponsors & partners | publications | seminars | contact us | staff only
© Copyright 2001, Language and Media Processing Laboratory, University of Maryland, All rights reserved.