Selected Publications
- Christopher Dyer, Smaranda Muresan and Philip Resnik (2008). Generalizing Word Lattice Translation. Proceedings of the Association for Computational Linguistics, ACL 2008 .
Abstract: Word lattice decoding has proven useful in spoken language
translation; we argue that it provides a compelling model for
translation of text genres, as well. Additionally, we show that prior work in translating lattices using finite state techniques can be naturally extended to more expressive synchronous context-free grammar-based
models. In the process, we resolve a significant
complication that lattice representations introduce in reordering
models. Our experiments evaluating the approach demonstrate
substantial gains for Chinese-English and Arabic-English translation.
- Smaranda Muresan and Owen Rambow (2007). Grammar Approximation by Representative Sublanguage: A New Model for Language Learning . In Proceedings of the Association for Computational Linguistics, ACL 2007, Prague, Czech
Republic.
Abstract: We propose a new language learning model that learns
a syntactic-semantic grammar from a small number of natural language
strings annotated with their semantics, along with basic assumptions
about natural language syntax. We show that the search space for
grammar induction is a complete grammar lattice, which guarantees
the uniqueness of the learned grammar.
(see also --- Smaranda Muresan 2005. Parsing Preserving Techniques in Grammar Induction. Technical Report CUCS-032-05, Columbia University, New York, NY. )
- Smaranda Muresan (2006). Learning
Constraint-based Grammars from Representative Examples: Theory and
Applications . PhD Dissertation, Columbia University, New
York, 2006
Short Abstract Theoretical Part : This dissertation defines a new type of constraint-based grammars, Lexicalized Well-Founded Grammars , which establish a direct link between natural language expressions and ontologies, facilitating a direct mapping between text and knowledge (Chapter 3). Semantic composition and semantic interpretation are defined as constraints at the grammar rule level (Chapter 4). I introduced a new grammar learning model from representative data Grammar Approximation by Representative Sublanguage , for learning LWFGs (Chapter 5). I have proven that the search space for grammar
induction is a complete grammar lattice, and I have provided
polynomial algorithms for grammar induction and proved that they are
correct (Chapter 5). Applicative Part: I have embedded all these theoretical concepts in a
practical, implemented system for grammar learning (Chapter 7). I have provided qualitative evaluations that cover the following issues: coverage of diverse and complex linguistic phenomena (Chapter 6); terminological knowledge acquisition from natural language definitions in the medical domain, as well as NL-query of this knowledge base (Chapter 8).