Yuval Marton

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom

 

 

Department of Linguistics

1401 Marie Mount Hall

University of Maryland

College Park, MD 20742-7505

Tel: (301) 405-7002

Fax: (301) 405-7104

Email: ymarton @t mail.umd DOT edu

 

 

I am a computational linguistics Ph.D. student at UMD. I have advanced to Candidacy in May 2007.

My current research interests include statistical machine translation (SMT) and paraphrase generation. More specifically, I am interested in infusing SMT with linguistic knowledge – via (leveraging from) incorporating soft syntactic constraints and / or soft semantic constraints into various corpus-based models.  I was also involved in text classification research (authorship attribution and topic / genre classification).

 

Following my interests in neuro-biologically plausible cognitive and linguistic models, I took several fascinating neuroscience courses at the Neuroscience and Cognitive Science (NACS) Program. My qualifying paper was about visual word recognition, in which I argued for a lexical representation that consists of both lower-level visual features and higher-level abstract letter objects, interacting with statistical factors (word frequency) and partly-innate factors (left or right visual field perception).

 

I am a member of the CLIP Lab at UMIACS, and I also frequent the CNL Lab.

My current advisors are Philip Resnik and Amy Weinberg.

My previous advisor is Lisa Hellerstein, back when I was a computer science graduate student at Polytechnic University, Brooklyn, NY (now part of NYU).

 

 

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom

 

 

Recent Papers and Publications

 

David Chiang, Yuval Marton and Philip Resnik. “Online Large-Margin Training of Syntactic and Structural Translation Features”. Conference on Empirical Methods in Natural Language Processing (EMNLP 2008). October 25-27, 2008. Waikiki, Honolulu, Hawaii. Accepted. Full paper.

 

Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT.

We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase-based model: first, we simultaneously train a large number of Marton and Resnik’s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 BLEU on a subset of the NIST 2006 Arabic-English evaluation data.

 

Yuval Marton and Philip Resnik. “Soft Syntactic Constraints for Hierarchical Phrased-Based Translation”. The 46th Annual Meeting of the Association for Computational Linguistics (ACL) 2008.

In adding syntax to statistical MT, there is a tradeoff between taking advantage of linguistic analysis, versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a context-free translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We obtain substantial improvements in performance for translation from Chinese and Arabic to English.

 

Yuval Marton. “What Can we Learn about Language Processing and Representation from Word Contour Effects on Letter Order Perception and Word Recognition in Right and Left Visual Fields?” Qualifying paper (Ling895), Department of Linguistics, University of Maryland, May 2007. Manuscript.

When we read a word, we typically read it all at once, not letter by letter.  This assumption can be verified in laboratory conditions, when each word is displayed for less than 150ms, insuring that subjects have no time to saccade or otherwise move their eyes (following Rayner’s findings [Rynr86]).  Given that all letters in a word are perceived at the same time, readers need to determine and encode letter order, in order to correctly identify the word.  Whitney [Wtny04b] has argued that letter encoding, and in particular letter order encoding, is done using abstract representations. We will show that word recognition is not done solely with abstract letter symbols; low-level visual properties of the written word – specifically its contour (operationalized here as the existence and location of ascending and descending letters) – are also used for this task. We argue that if both contour information and abstract letter symbols are used for word recognition, they do not combine in a simple additive manner. We also argue that vision provides a unique contribution to language – parallel input processing, beyond mere equivalence to (serial) sound sequences. We will show differences and similarities of performance in left and right visual fields (LVF and RVF) in Hebrew, in different word contour and word frequency conditions, and contrast predictions of three theories of the well-known RVF advantage: innate left hemispheric advantage for language processing (e.g., [YE85]), acquired retinal/cortical RVF expertise (e.g., [Nzr00]), and  computational neural network assumptions ([LW05]).

 

Yuval Marton, Ning Wu, and Lisa Hellerstein. "On Compression-Based Text Classification". Proceedings of the 27th European Conference on Information Retrieval (ECIR), Spain, March 2005. Abstract. Full paper. Click here for the errata note!

Compression-based text classification methods are easy to apply, requiring virtually no preprocessing of the data. Most such methods are character-based, and thus have the potential to automatically capture non-word features of a document, such as punctuation, word-stems, and features spanning more than one word.  However, compression-based classification methods have drawbacks (such as slow running time), and not all such methods are equally effective. We present the results of a number of experiments designed to evaluate the effectiveness and behavior of different compression-based text classification methods on English text. Among our experiments are some specifically designed to test whether the ability to capture non-word (including super-word) features causes character-based text compression methods to achieve more accurate classification.

 

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom

 

 

Teaching

 

TA for Computational Linguistics II (Ling647 / CMSC828R), taught by Philip Resnik, Spring 2006.

TA for Introductory Linguistics (Ling200), taught by Tonia Bleam, Spring 2008, TA.

 

 

 

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom

 

 

Academic Activities

 

I took part (or still am) in the following:

 

NACS Program (The Program in Neuroscience and Cognitive Science at the University of Maryland):  I received the NACS Program Certificate in August 2008.

Reviewer: ACM TALIP Journal 2006. (Association for Computing Machinery: Transactions on Asian Language Information Processing)

Psycholinguistic experiments – sign up for language experiments on-line now!

The Machine Translation MURI project, Spring 2006 – Present.

Colloquium Committee, member, Fall 2005 – Spring 2006.

Semantics Search Committee, member, Fall 2005 – Spring 2006.

LSA Institute, MIT, Summer 2005.

 

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom

 

 

Other Activities

 

GSG (Graduate Student Government) Rep, Linguistics, Fall 2005 – Spring 2006.

GSG Student Affairs Committee, member, Fall 2005 – Spring 2006.

Grammar Society group, President, Fall 2005 – Spring 2006.

LGSA (the Linguistics Graduate Students Association) Rep, Spring 2005 – Fall 2005.

 

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom

 

 

The Future

 

Under construction! (Permanently)

 

 

Top | Recent Publications | Teaching | Academic Activities | Other activities | Bottom