PHILIP IS WORKING ON THIS AT THE MOMENT. DO NOT USE. This is the schedule of topics for Computational Linguistics II, Spring 2009.
Readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing, unless otherwise specified. The "other" column has optional links pointing either to material you should already know (but might want to review) or to related material you might be interested in.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the class mailing list or e-mail me for "official"
dates.
Class | Topic |
Readings* | Assignments | Other | |
---|---|---|---|---|---|
Jan 28 | Course administrivia, semester plan; some statistical NLP fundamentals |
Ch 1, 2.1.[1-9] (for review) Probability spaces; finite-state and Markov models; expected values; Bayes' Rule |
Assignment 1 | Corpus Colossal (The Economist, 20 Jan 2005); Language Log; Resnik and Elkiss (DRAFT); Linguist's Search Engine | |
Feb 4 | Words and lexical association |
Ch 5 Zipf's law; collocations; mutual information; hypothesis testing |
Assignment 2 | Dunning (1993); Kilgarriff (2005); Gries (2005); Bland and Altman (1995) | |
Feb 11 | Information theory |
Ch 2.2, Ch 6 Information theory essentials; entropy, relative entropy, mutual information; noisy channel model; cross entropy and perplexity |
Assignment 3, due Wednesday Feb 25 at 1:30pm | ||
Feb 18 | Maximum likelihood estimation and Expectation Maximization |
Skim Ch 9-10 Maximum likelihood estimation overview; quick review of smoothing; HMM review; deriving forward-backward algorithm as an instance of EM; Viterbi algorithm. |
An
empirical study of smoothing techniques for language modeling (Stanley
Chen and Joshua Goodman, Technical report TR-10-98, Harvard University,
August 1998); Revised Chapter 4 from the updated Jurafsky and Martin textbook. |
||
Feb 25 | Probabilistic grammar |
Ch 11-12, Abney (1996) Memoization and dynamic programming; review of CKY; defining PCFG; PCKY (inside probabilities); Viterbi CKY; revisiting EM: the inside-outside algorithm |
Assignment 4. , due at start of class Wednesday March 11. (Please send initial time estimates now.) |
Jason Eisner's great parsing song; Pereira (2000); Detlef Prescher, A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars; McClosky, Charniak, and Johnson (2006), Effective Self-Training for Parsing | |
Mar 4 | Probabilistic parsing |
Parsing as inference; distinction between logic and control; Viterbi CKY; CFG extensions (grandparent parent nodes, lexicalization) | |||
Mar 11 | Watch and discuss Dan Klein talk on grammar induction (first 30 minutes of this online talk) | Take-home midterm handed out | Petrov and Klein, Learning and Inference for Hierarchically Split PCFGs | ||
Mar 18 | Spring Break |
Have fun! | |||
Mar 25 | Supervised classification |
Ch 16 Supervised learning -- k-nearest neighbor classification; naive Bayes; decision lists; decision trees; transformation-based learning (Sec 10.4); linear classifiers; the kernel trick; perceptrons; SVM basics. |
|||
Apr 1 | Evaluation in NLP | Evaluation paradigms for NLP; parser evaluation in particular | |||
Apr 8 | Maxent; supervised approaches to word sense disambiguation |
Ch 7; Adwait Ratnaparkhi, A Maximum
Entropy Model for Part-Of-Speech Tagging (EMNLP 1996);
Resnik, "WSD in NLP Applications" (Ch 11 in
Edmonds and Agirre (2006)) The maximum entropy principle and maxent models; feature selection. |
Team project handed out. | Other useful readings include Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997) and Adam Berger's maxent tutorial; and Noah Smith's notes on loglinear models. | |
Apr 15 | Unsupervised and semi-supervised WSD |
Ch 8.5, 15.{1,2,4} Characterizing the WSD problem; WSD as a supervised classification problem. Unsupervised methods/Lesk's algorithm semi-supervised learning and Yarowsky's algorithm; WSD in applications; WSD evaluation; IR basics. |
|||
Apr 22 | Machine translation |
Ch 13 and Adam Lopez, Statistical Machine Translation,
In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond word-level models |
Also potentially useful or of interest:
Kevin Knight, A Statistical MT Tutorial Workbook;
Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283-299. |
||
Apr 29 | Phrase-based statistical MT |
Papineni, Roukos, Ward and Zhu. 2001.
BLEU: A Method for Automatic Evaluation
of Machine Translation
Components of a phrase-based system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation. |
Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation; Koehn (2004) presentation on PHARAOH decoder | ||
May 6 | TBD |
Take-home final handed out |