*
Philip Resnik
|
News flash (December 2009): My comments on IBM's n.Fluent project appeared in an article in IEEE Computing Now.
News flash (December 2009): The University of Maryland Death Penalty Corpus is now available. News flash (October 2009): Gibbs Sampling for the Uninitiated News flash (October 2009): The project with Ben Bederson on translation as a collaborative process has now also received NSF sponsorship. Ben gave a Google tech talk about the project which is available on YouTube. News flash (August 2009): Much to my surprise, I was recently listed at #82 on the Future Health 100, a list of "the most creative and influential innovators working in healthcare today" at healthspottr.com. This is in connection with my work with CodeRyte Inc. on using natural language processing to improve medical coding, an expensive and labor intensive bottleneck in the U.S. healthcare system for which there is a severe shortage of human coders. News flash (July 2009): Ben Bederson and I have received a Google Research Award sponsoring our research on translation as a collaborative process. The project is blending ideas from machine translation, human computer-interfaces, and distributed human computation ("crowdsourcing") in order to find ways to achieve low cost, high quality translation by taking advantage of monolingual humans in a computer-assisted translation protocol.
|
I do research in computational linguistics, with interests both in the modeling of human linguistic processes (especially lexical semantics, lexical acquisition, and on-line sentence processing) and in the application of natural language processing techniques to practical problems such as machine translation and sentiment analysis. My general research agenda for language technology is to improve the state of the art by finding the right balance between knowledge-free statistical modeling and linguistically informed techniques -- and in so doing, to obtain a better scientific understanding of human language itself.
My recent work has largely been focused on machine translation and multilingual natural language processing, exploiting parallel corpora and linguistically informed modeling in statistical machine translation and in multilingual natural language processing more generally (with a focus on Chinese and Arabic, as well as other less-studied languages). As part of this effort, my postdoc David Chiang (now at USC/ISI) developed Hiero, the first syntax-based system to demonstrate performance comparable to then state-of-the-art statistical phrase-based MT systems (see 2005 NIST MT Evaluation results). I have worked with a number of students to further improve hierarchical phrase-based translation, and some innovations include the introduction of lattice decoding (useful in translation of speech recognition output and also for text translation of morphologically complex languages), development of efficient algorithms for using suffix array representations in hierarchical decoding, use of English-to-English translation to create artificial reference translations for use in parameter tuning, the introduction of soft syntactic constraints based on source language structure, and exploitation of lattices (and soon, forests) to represent source language paraphrase and syntactically driven reordering alternatives.Connected with my machine translation research, Ben Bederson and I are working on an ambitious attempt to achieve low cost, high quality translation by taking advantage of monolingual human participants in a computer-assisted translation protocol, in a project we call "Translation as a Collaborative Process". We're blending ideas from machine translation, human computer-interfaces, and distributed human computation ("crowdsourcing"), and tackling the real-world problem of translating books in the International Children's Digital Library. We received a 2009 Google Research Award sponsoring this work, as well as funding from NSF. In September 2009, Ben gave a Google tech talk about the project which is available on YouTube.
I have also been ramping up on work in sentiment analysis, with a particular interest in the connections among lexical semantics, surface linguistic expression, and underlying internal state. For example, why does my son say "My toy broke" instead of "I broke my toy"? He's using syntax to package up the statement about what happened in a way that de-emphasizes semantic properties such as causation, volition, and change-of-state. (This is an example of using syntax for "spin", just the same way that Ronald Reagan did in 1987 when he sidestepped attributing responsibility for the Iran-contra scandal; remember "Mistakes were made"? Precocious child.) My student Stephan Greene did a fascinating dissertation on this topic, and for a conference-paper-length description see our 2009 NAACL paper. I'm continuing to pursue this research, and current topics of investigation include modeling syntax/semantics/sentiment connections in a Bayesian framework, bootstrapping multilingual sentiment analysis capabilities, and broadening the approach beyond sentiment to other psychological and socio-cultural variables.
During the next several years, I hope to re-engage more fully with my interests in computational psycholinguistics. I'm particularly interested in the possibility that ideas from (statistical) information theory may have a useful role to play in explaining why language works the way it does. (This is an idea I first began exploring in my dissertation [ps, pdf].) I'm also trying to use Bayesian modeling as a way to bring linguists here with cognitive modeling interests together with computational linguists focusing on applications.
See my on-line list of publications for links to papers on the above research topics and more.
Philip Resnik, Associate Professor Department of Linguistics and Institute for Advanced Computer Studies 1401 Marie Mount Hall UMIACS phone: (301) 405-6760 University of Maryland Linguistics phone: (301) 405-8903 College Park, MD 20742 USA Fax : (301) 405-7104 http://umiacs.umd.edu/~resnik E-mail: resnik [AT] umd _DOT_ edu UMIACS office: AV Williams 3143 By far the best way to reach me is by e-mail to resnik [AT] umd _DOT_ edu.