\ Schedule of Topics

Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2011.

Unless otherwise specified, readings are from Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in.

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the class mailing list or e-mail me for "official" dates.

Class	Topic	Readings*	Assignments	Other
Jan 26	Course administrivia, semester plan; some statistical NLP fundamentals	Ch 1, 2.1.[1-9] (for review) Historical overview; Zipf's law; Probability spaces; finite-state and Markov models; Bayes' Rule; Bayesian updating; conjugate priors	Assignment 1	Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest)
Feb 2	Words and lexical association	Ch 5 Collocations; hypothesis testing; mutual information;	Assignment 2	Dunning (1993); Goodman S (1999). "Toward evidence-based medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995–1004. PMID 10383371.; Goodman S (1999). "Toward evidence-based medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005–13. PMID 10383350. Kilgarriff (2005); Gries (2005); Bland and Altman (1995);
Feb 9	Information theory	Ch 2.2, Ch 6 Information theory essentials; entropy, relative entropy, mutual information; noisy channel model; cross entropy and perplexity	Assignment 3
Feb 16	Maximum likelihood estimation and Expectation Maximization	Skim Ch 9-10, Chapter 6 of Lin and Dyer (forthcoming). Read my EM recipe discussion. Maximum likelihood estimation overview; quick review of smoothing; EM overview; HMM review; deriving forward-backward algorithm as an instance of EM; Viterbi algorithm review.	Assignment 4	An empirical study of smoothing techniques for language modeling (Stanley Chen and Joshua Goodman, Technical report TR-10-98, Harvard University, August 1998); Revised Chapter 4 from the updated Jurafsky and Martin textbook.
Feb 23	Probabilistic grammars and parsing	Ch 11-12, Abney (1996) [alternative link], my EM recipe discussion, and the EM recipe used to derive the inside-outside algorithm Parsing as inference; distinction between logic and control; Memoization and dynamic programming; brief review of CKY, PCKY (inside probabilities), Viterbi CKY; revisiting EM: the inside-outside algorithm. CFG extensions (grandparent parent nodes, lexicalization); syntactic dependency trees.	Extra credit assignment, worth 50% of a homework assignment	Jason Eisner's great parsing song; Pereira (2000); Detlef Prescher, A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars; McClosky, Charniak, and Johnson (2006), Effective Self-Training for Parsing
Mar 2	Advanced topic: on Parsing and psychological plausibility Guest presenter/facilitator: Kristy Hollingshead	Stolcke (1995), An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities, (through section 4.4) for the Earley algorithm; Resnik (1992), Left-Corner Parsing and Psychological Plausibility Left-corner parsing; Earley's algorithm; using parsing as a diagnostic tool for Alzheimer's (and, to a lesser extent, autism)	Take-home midterm handed out	Roark et al. (2007), Syntactic complexity measures for detecting Mild Cognitive Impairment
Mar 9	Supervised classification	Ch 16 (except 16.2) Supervised learning -- k-nearest neighbor classification; naive Bayes; decision lists; decision trees; transformation-based learning (Sec 10.4); linear classifiers; the kernel trick; perceptrons; SVM basics.
Mar 16	Beyond supervised learning	Class imbalance; model and search errors; rescoring; oracle evaluations; self-training, active-learning, co-training. Using text to predict the real world.		Och et al. 'Smorgasbord' paper, Noah Smith and Philip Resnik, Using Text to Predict The Real World.
Mar 23	Spring Break		Have fun!
March 30	Evaluation in NLP	Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook. Evaluation paradigms for NLP; parser evaluation	Team project handed out
Apr 6	More on supervised learning: maximum entropy models and conditional random fields [Guest lecturer TBA]	Using maximum entropy for text classification (Kamal Nigam, John Lafferty, Andrew McCallum); Shallow Parsing with Conditional Random Fields (Fei Sha and Fernando Pereira) The maximum entropy principle; maxent classifiers (for predicting a single variable); CRFs (for predicting interacting variables); L2 regularization.		Optionally, some good introductory material appears in Adam Berger's maxent tutorial, Dan Klein and Chris Manning's Maxent Models, Conditional Estimation, and Optimization, without the Magic, and Noah Smith's notes on loglinear models (which provides explicit details for a lot of the math). Another useful reading, focused on estimating the parameters of maxent models, is A comparison of algorithms for maximum entropy parameter estimation (Rob Malouf). Also, Manning and Schuetze section 16.2 can be read as supplementary material. Of historical interest: Adwait Ratnaparkhi's A Simple Introduction to Maximum Entropy Models for Natural Language Processing (1997).
Apr 13	Unsupervised methods and topic modeling [topic tentative]	Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated (new version to be linked shortly); M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning [tentative] Graphical model representations of generative models; MLE, MAP, and Bayesian inference; Markov Chain Monte Carlo (MCMC)and Gibbs Sampling; Latent Dirichlet Allocation (LDA)		Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation
Apr 20	Word sense disambiguation	Ch 8.5, 15.{1,2,4} Semantic similarity; relatedness; synonymy; polysemy; homonymy; entailment; ontology-based similarity measures; vector representations and similarity measures; sketch of LSA. Characterizing the WSD problem; WSD as a supervised classification problem. Lesk algorithm; semi-supervised learning and Yarowsky's algorithm; WSD in applications; WSD evaluation.		Optional: Adam Kilgarriff (1997) I don't believe in word senses Computers and the Humanities 31(2), pp. 91-113; Philip Resnik (2006), WSD in NLP Applications (Google Books)
Apr 27	Machine translation	Ch 13 and Adam Lopez, Statistical Machine Translation, In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008. Historical view of MT approaches; noisy channel for SMT; IBM models 1 and 4; HMM distortion model; going beyond word-level models		Also potentially useful or of interest: Kevin Knight, A Statistical MT Tutorial Workbook; Mihalcea and Pedersen (2003); Philip Resnik, Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation. In Alexander Gelbukh (ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing, Springer, 2004, pp. 283-299.
May 4 [tentative]	Phrase-based statistical MT	This material may be folded into the previous class in order to make room for a different topic. Papineni, Roukos, Ward and Zhu. 2001. BLEU: A Method for Automatic Evaluation of Machine Translation Components of a phrase-based system: language modeling, translation modeling; sentence alignment, word alignment, phrase extraction, parameter tuning, decoding, rescoring, evaluation.	Take-home final handed out	Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation; Koehn (2004) presentation on PHARAOH decoder

*Readings are from Manning and Schuetze unless otherwise specified. Do the reading before the class where it is listed!

Return to course home page