LAMP - Language Group

LAMP Seminar
Language and Media Processing Laboratory
Conference Room 4406
A.V. Williams Building
University of Maryland

MARCH 9, 1999, 1:00
Huiping Li

LAMP, University of Maryland
Text Processing in Digital Document and Video Databases

ABSTRACT

Text is an important information source in documents and digital videos. In this talk we will address the problems of digital text processing in documents and digital video. The topics include text extraction, shapecoding and its application to duplicate document detection, text quality evaluation and text enhancement in document domain and text identification, tracking, enhancement and recognition in video domain. The presentation includes the work which has been done and the proposed work. Steady increases in computational power and affordable storage have allowed organizations to scan large numbers of documents into databases with no indexing information. We developped a shapecode based technique to detect duplicate document in large databases. The representative text line is extracted and chosen from binary document images. We use dynamically propogate xline and baseline to shapecode characters in degraded text lines. The extracted shapecode is converted to keys to index directly into database. Another interesting topic is the document quality evaluation and enhancement. The motivation of this research is while the Optical Character Recognition (OCR) has advanced to the point that most commercial OCR softwares can achieve the recognition accuracy as high as 99.9\% for clean documents, the performance will degrade rapidly even when a little noise is introduced. We will discuss the techniques to enhance the text in degraded document images. We think text enhancement is essential for robust character recognition as well as other applications. Our proposed method for text enhancement is more processing oriented than domain oriented. The basic idea is that we use estimation techniques to evaluate the document quality and then automatically apply the enhancement technique to different types of degradations. Text in digital video can provide important supplemental information for retrieval and indexing. Compared with document images, text processing in digital video has some new challenges. We will address new problems appeared and the techniques we developped to slove them. We developped a hybrid wavelet/neural network classifier to segment text from digital video. To find temporal correspondence of text blocks in consecutive frames, we use a multi-resolution based image matching to track the detected text. Text contour is used to refine the matched result. After registration, we use the text blocks in multiple frames to smooth the background. Image interpolation techniques is used to raise the textual image resolution. The experimental results show these techniques can improve OCR result significantly. Finally, we will discuss the text-based indexing and retrieval in digital video.