\ Schedule of Topics

Schedule of Topics


This is the schedule of topics for Computational Linguistics II, Spring 2011.

Unless otherwise specified, readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in.

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the class mailing list or e-mail me for "official" dates.

Class Topic
Readings* Assignments Other
Jan 26 Course administrivia, semester plan; some statistical NLP fundamentals
Ch 1, 2.1.[1-9] (for review)
Historical overview; Zipf's law; Probability spaces; finite-state and Markov models; Bayes' Rule; Bayesian updating; conjugate priors
Assignment 1 Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest)
Feb 2 Words and lexical association
Ch 5
Collocations; hypothesis testing; mutual information;
Assignment 2 Dunning (1993);
Goodman S (1999). "Toward evidence-based medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995–1004. PMID 10383371.;
Goodman S (1999). "Toward evidence-based medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005–13. PMID 10383350.
Kilgarriff (2005);
Gries (2005);
Bland and Altman (1995);
Feb 9 Information theory
Ch 2.2, Ch 6
Information theory essentials; entropy, relative entropy, mutual information; noisy channel model; cross entropy and perplexity
Assignment 3
Feb 16 Maximum likelihood estimation and Expectation Maximization
Skim Ch 9-10, Chapter 6 of Lin and Dyer (forthcoming). Read my EM recipe discussion.
Maximum likelihood estimation overview; quick review of smoothing; EM overview; HMM review; deriving forward-backward algorithm as an instance of EM; Viterbi algorithm review.
Assignment 4 An empirical study of smoothing techniques for language modeling (Stanley Chen and Joshua Goodman, Technical report TR-10-98, Harvard University, August 1998);
Revised Chapter 4 from the updated Jurafsky and Martin textbook.
Feb 23 Probabilistic grammars and parsing
Ch 11-12, Abney (1996) [alternative link], my EM recipe discussion, and the EM recipe used to derive the inside-outside algorithm
Parsing as inference; distinction between logic and control; Memoization and dynamic programming; brief review of CKY, PCKY (inside probabilities), Viterbi CKY; revisiting EM: the inside-outside algorithm. CFG extensions (grandparent parent nodes, lexicalization); syntactic dependency trees.
Extra credit assignment, worth 50% of a homework assignment Jason Eisner's great parsing song; Pereira (2000); Detlef Prescher, A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars; McClosky, Charniak, and Johnson (2006), Effective Self-Training for Parsing
Mar 2 Advanced topic: on Parsing and psychological plausibility
Guest presenter/facilitator: Kristy Hollingshead
Stolcke (1995), An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities, (through section 4.4) for the Earley algorithm; Resnik (1992), Left-Corner Parsing and Psychological Plausibility
Left-corner parsing; Earley's algorithm; using parsing as a diagnostic tool for Alzheimer's (and, to a lesser extent, autism)
Take-home midterm handed out Roark et al. (2007), Syntactic complexity measures for detecting Mild Cognitive Impairment
Mar 9 Supervised classification
Ch 16 (except 16.2)
Supervised learning -- k-nearest neighbor classification; naive Bayes; decision lists; decision trees; transformation-based learning (Sec 10.4); linear classifiers; the kernel trick; perceptrons; SVM basics.
Mar 16 Beyond supervised learning
Class imbalance; model and search errors; rescoring; oracle evaluations; self-training, active-learning, co-training. Using text to predict the real world. Och et al. 'Smorgasbord' paper, Noah Smith and Philip Resnik, Using Text to Predict The Real World.
Mar 23 Spring Break
Have fun!
March 30 Evaluation in NLP Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.
Evaluation paradigms for NLP; parser evaluation
Team project handed out
Apr 6 More on supervised learning: maximum entropy models and conditional random fields [Guest lecturer TBA]
Using maximum entropy for text classification (Kamal Nigam, John Lafferty, Andrew McCallum); Shallow Parsing with Conditional Random Fields (Fei Sha and Fernando Pereira)
The maximum entropy principle; maxent classifiers (for predicting a single variable); CRFs (for predicting interacting variables); L2 regularization.
Optionally, some good introductory material appears in Adam Berger's maxent tutorial, Dan Klein and Chris Manning's Maxent Models, Conditional Estimation, and Optimization, without the Magic, and Noah Smith's notes on loglinear models (which provides explicit details for a lot of the math). Another useful reading, focused on estimating the parameters of maxent models, is A comparison of algorithms for maximum entropy parameter estimation (Rob Malouf). Also, Manning and Schuetze section 16.2 can be read as supplementary material. Of historical interest: Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997).
Apr 13 Unsupervised methods and topic modeling [topic tentative]
Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated (new version to be linked shortly); M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning
[tentative] Graphical model representations of generative models; MLE, MAP, and Bayesian inference; Markov Chain Monte Carlo (MCMC)and Gibbs Sampling; Latent Dirichlet Allocation (LDA)
Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation
Apr 20 Word sense disambiguation
Ch 8.5, 15.{1,2,4}
Semantic similarity; relatedness; synonymy; polysemy; homonymy; entailment; ontology-based similarity measures; vector representations and similarity measures; sketch of LSA. Characterizing the WSD problem; WSD as a supervised classification problem. Lesk algorithm; semi-supervised learning and Yarowsky's algorithm; WSD in applications; WSD evaluation.
Optional: Adam Kilgarriff (1997) I don't believe in word senses Computers and the Humanities 31(2), pp. 91-113; Philip Resnik (2006), WSD in NLP Applications (Google Books)
Apr 27 Machine translation
Ch 13 and Adam Lopez, Statistical Machine Translation, In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.

Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond word-level models

Also potentially useful or of interest: Kevin Knight, A Statistical MT Tutorial Workbook;
Mihalcea and Pedersen (2003);
Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283-299.
May 4 [tentative] Phrase-based statistical MT This material may be folded into the previous class in order to make room for a different topic.

Papineni, Roukos, Ward and Zhu. 2001. BLEU: A Method for Automatic Evaluation of Machine Translation

Components of a phrase-based system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation.

Take-home final handed out Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation; Koehn (2004) presentation on PHARAOH decoder

*Readings are from Manning and Schuetze unless otherwise specified. Do the reading before the class where it is listed!

Return to course home page