SDIUT'03

Agenda:

The conference will be spread over three days April 9, 10, and 11th, 2003. Each session will have a government representative discussing programs or needs of the focused technology. Thursday afternoon will be devoted to abstracts and demonstrations. The Friday agenda will have additional talks and government discussions on potential collaboration and funding sources.

2003 Symposium on Document Image Understanding Technology
Greenbelt Marriott, Greenbelt Maryland
April 9-11th, 2003

Tentative Agenda

Wednesday, April 9th

9:00 Welcome

9:10 Page Structure 1

Assuring High-Accuracy Document Understanding: Retargeting, Scaling Up, and Adapting
Henry Baird, Kris Popat, Thomas Breuel, Prateek Sarkar, Daniel Lopresti (Palo Alto Research Center)

Automated Data Extraction from Structured Documents
Janusz Wnek (Science Applications International Corporation)

Automated Logo Detection and Recognition
Tom Drayer and Ken Cantwell (Department of Defense)

10:20 Break

10:40 Government Talk

Retrospective of Document Analysis since SDIUT 95
Steve Dennis (Department of Defense)

11:00 Multilingual Documents

Farsi Searching and Display Technologies
Kazem Taghva, Ron Young, Jeff Coombs, Ray Pereda
Russell Beckley, and Mohammad Sadeh
(University of Nevada, Las Vegas)

Porting the BBN BYBLOS OCR System to New Languages
Prem Natarajan, Michael Decerbo, Tom Keller, Rich Schwartz, and John Makhoul (BBN Technologies)

Segmenting and Tagging Structured Content
Huanfeng Ma, Burcu Karagol-Ayan and David Doermann
(University of Maryland, College Park)

12:00 Lunch

1:00 Keynote Talk –Document Processing, and Understanding: An Integrated Approach, Patrice Simard, Microsoft Research

2:00 Handwriting

A System for Handwriting Matching and Recognition
Sargur Srihari Bin Zhang, Catalin Tamai, Sagnjik Lee, Zhixin Shi and Yong-Chul Shini (State University of New York/Cedar Buffalo)

Indexing of Handwritten Historical Documents – Recent Progress
R. Manmatha and T.M. Rath (University of Massachusetts, Amherst)

Document Categorization Using Latent Semantic Indexing
Anthony Zukas and Robert Price
(Science Applications International Corporation)

Parsing Freeform Handwritten Notes on the Tablet PC
Michael Shilman (Microsoft Research)

3:30 Break

3:40 Government Talk

OCR for Collection, Management, and Retrieval of Documents: Development and Trial of a Documentation Exploitation Suite
Luis Hernandez and Christian Schlesiger (Army Research Laboratory)

4:00 Degraded Documents

Processing Noisy Documents
David Doermann and Huiping Li (University of Maryland, College Park)

Summarizing Noisy Documents
Hongyan Jing, Daniel Lopresti and Chilin Shih (Bell-Laboratories, Lucent Technologies Inc.)

Rough and Degraded Document Interpretation by Perceptual Organization
Eric Saund, David Fleet, James Mahoney, and Daniel Larner (Palo Alto Research Center)

OCR Accuracy Prediction as a Script Identification Problem
Vitaly Ablavsky, Joshua Pollak, Magnus Snorrasen
and M. Stevens (Charles River Analytics)

Thursday, April 10th

9:00 OCR and OCR Correction

A Survey of Retrieval Strategies for OCR Text Collections
Steven Beitzel, Eric Jensen and David Grossman
(Illinois Institute of Technology)

OCR Accuracy and Retrievability of Post-Processed Documents
T. Nartker, Kazem Taghva, and Julie Borsack (University of Nevada, Las Vegas)

Varying Effects of Image Improvement Methods on OCR Accuracy Kristen Summers (Vredenburg)

OCR Correction Using Historical Relationships from Verified Text in Biomedical Citations
Susan Hauser, Tehseen Sabir and George Thoma
(National Library of Medicine)

10:20 Break

10:40 Document Analysis Resources

Balanced Query Methods for Improving OCR-Based Retrieval
Kareem Darwish and Douglas Oard
(University of Maryland, College Park)

Creation of Multi-Lingual Data Resources and Evaluation
Tool for OCR
Srirangaraj Setlur, Suryaprakash Kompalli, Ramanaprasad Vemulapati and Venu Govindaraju
(State University of New York/Cedar Buffalo)

Multilingual OCR Ground Truth from Printed and Web Sources
Mark Turner, Yuliya Katsnelson, and Kristen Summers
(Vredenburg, Inc)

Ground Truth Data for Document Image Analysis
Glen Ford and George Thoma (National Library of Medicine)

12:00 Lunch

1:00 Keynote Talk – Overview of the Questioned Document Unit
Gabriel Watts (Federal Bureau of Investigation)

2:00 Government Talk

Transitioning Experimental HMM OCR: From Lab to Field
Christian Schlesiger, Luis Hernandez, and Michael Lee (Army Research Laboratory)

The Effects of Document Analysis on Automatic Content Extraction
Jonathan K. Davis (Department of Defense)

An Automation Tool For the Detection of Sensitive Information
Gary DeWitt (Department of Energy)

3:30 Exhibits/Demonstrations and Poster/Abstracts

The Gamera Framework for Building Custom Recognition Systems
Mike Droettboom, Karl MacMillan and
Ichiro Fujinaga (Johns Hopkins University)

3D Methods to Aid Handwriting Analysis and OCR
Anshuman Razdan, John Femiani, Jeremy Rowe (Arizona State University)

Automated Reading of Free-Form Handwriting in Images, The Past and One Proposed Future
Joanna Fancy (Higherglyphics)

The Video Spectral Comparator 2000HR
Greggory Mokrzycki (Federal Bureau of Investigation)

Demonstration for Parsing Freeform Handwritten Notes on the Tablet PC
Michael Shilman (Microsoft, Incorporated)

ABBYY OCR Software
Artur Vassylyev and Ding-Yuan Tang (ABBYY Software House)

Document Layout Anaylsis
Thomas Breuel, Palo Alto Research Center
High View Document Image Management Tool
Mark Turner, Vredenburg

Scansoft Asian Language OCR Capability
Tom D'Errico (ScanSoft)

Groundtruth Image Generation from Electronic Text (Demonstration)
David Doermann and Gary Zi (University of Maryland)

4:30-7:00 Demos, Posters and Exhibits

An informal reception will be held during the Exhibit Session with Food and Drink.

Friday, April 11th

9:00 Page Structure 2

High Performance Document Layout Analysis
Thomas Breuel (Palo Alto Research Center)

Automated Layout Recognition
Lynn Golebiowski and Alan Sakakihara, Booz Allen Hamilton

Automatic Forms Processing in the NIST Forms DAtabase Document Image Understanding Technology 2003
Carson Cumbee (Department of Defense)

Amplifying Accuracy through Style-Consistency
Prateek Sarkar and Thomas Breuel (Palo Alto Research Center)

A Generative Probabilistic OCR Model
Okan Kolak, Philip Resnik and William Byrne, University of Maryland, College Park and The Johns Hopkins University)

10:40 Break

11:00 Multimedia:

Form Analysis with the Nondeterministic Agent
Tom Henderson and Lavanya Swaminathan (University of Utah)

Metrics for Evaluating the Performance of Video Text Recognition Systems
Greg Myers (Stanford Research Institute)

Universal Document Management System for the Mobile Warrior
H. Alam, R. Hartono, Fuad Rahman, Y. Tarnikova, T. Tjahjadi and C. Wilcox (BCL Technologies)

12:00 Lunch:

1:00 Government:

The Declassification Challenge: Can Technology Make a Difference?
Richard Warshaw (Central Intelligence Agency)

VACE Advanced R&D Program
John Prange (Advanced Research and Development Activity)

2.00 Panel:

Government Grand Challenges: What we need and how we get it?

2003 Symposium on Document Image Understanding Technology Greenbelt Marriott, Greenbelt Maryland April 9-11th, 2003

Tentative Agenda

Wednesday, April 9th

2003 Symposium on Document Image Understanding Technology
Greenbelt Marriott, Greenbelt Maryland
April 9-11th, 2003