Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2013.

In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.

Class	Topic	Readings*	Assignments	Other
Jan 23	Course administrivia, semester plan; some statistical NLP fundamentals	M&S Ch 1, 2.1.[1-9] (for review)	Assignment 1	Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest)
Jan 30	Words and lexical association	M&S Ch 5	Assignment 2	Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations. Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events. Two papers by Goodman are valuable in terms of understanding p-value and its limitations (although there are almost certainly good recent discussions, given the rise in attention to these issues) -- Goodman S (1999). "Toward evidence-based medical statistics. 1: The P value fallacy.". Ann Intern Med 130 (12): 995-1004. PMID 10383371 and Goodman S (1999). "Toward evidence-based medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005-13. PMID 10383350. Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology in language research.
Feb 6	Information theory	M&S Ch 2.2, M&S Ch 6	Assignment 3	Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into all of the concepts from this lecture with greater rigor but a lot of clarity.
Feb 13	Cross entropy; Maximum likelihood estimation and Expectation Maximization	Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read my EM recipe discussion.	Assignment 4	Some papers I mentioned in class are Piantadoso et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density; Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.
Feb 20	EM continued; probabilistic grammars and parsing	Read M&S Ch 11. Skim Ch 12 with an emphasis on 12.1.4.	Assignment 5	Strongly recommended: re-read my EM recipe discussion from last week. Two helpful (but optional) companions to this week's readings: the Bilmes (1998) EM tutorial, especially useful if you want to see the formal way to derive EM update equations using Q functions and optimization in the M step using Lagrange multipliers, and Chris Dyer's nice notes on using my EM recipe to derive the inside-outside algorithm for PCFGs. If you just haven't gotten enough of the formal underpinnings for EM, another nice intro discussion is in Sections 1-3 of Collins (1997). Optional, but well worth reading sometime this semester: Abney (1996) on statistical methods and linguistics. Note for the future: M&S Section 12.1.8 will be relevant when we talk about evaluation later in the semester.
Feb 27	Probabilistic grammars and parsing	Same as last week.	Assignment 6	Optional, but I'm likely to discuss at least parts of these this week, so worth skimming over if you have time: Matsuzaki et al. (2005), Probabilistic CFG with latent annotations and Petrov et al. (2006), Learning Accurate, Compact, and Interpretable Tree Annotation.
Mar 6	Cancelled -- snow day!
Mar 13	Supervised classification	M&S Ch 16 except 16.2.1; Hearst et al. 1998 Support Vector Machines (cleaner copy here)	Assignment 7	I picked Hearst et al. (1998) as the SVM reading because it's the clearest, shortest possible introduction. There are many other good things to read at svms.org, including a "best tutorials" section, broken out by introductory, intermediate, and advanced, under Tutorials. Feel free to go with one of the other tutorials (the ones I've seen used most often are Burges 1998 and Smola et al. (1999))) instead of Hearst if you want a meatier introduction. Optional: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. Noah Smith's (2004) Log-Linear Models is a nice alternative introduction expressed in a vocabulary that is more consistent with current work.
Mar 20	Spring Break		Have fun!
Mar 27	Structured learning Guest lecture: Wu Ke	Lecture notes	Take-home midterm handed out	Optional papers originating key ideas: The original CRF paper by John Lafferty et al. Sha Fei and Fernando Pereira's paper on chunking with CRF. Michael Collins' paper that introduced the structured perceptron. Other optional material: Charles Sutton and Andrew McCallum's CRF tutorial in Foundations and Trends of Machine Learning. Ben Taskar et al.'s dissertation on large margin training of structured prediction models. Another paper by Ioannis Tsochantaridis et al. on the same topic with a different approach. A nice benchmark comparison of several algorithms for training CRFs and structured perceptrons.
April 3	Evaluation in NLP	Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.	Assignment 8
Apr 10	Bayesian inference and modeling	Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated (new version to be linked shortly); M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning	Final exam project begins	Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation
April 17	Gibbs sampling and LDA	Same as last week	None.
Apr 24	Machine translation	Ch 13 and Adam Lopez, Statistical Machine Translation, In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.	None.	Also potentially useful or of interest: Kevin Knight, A Statistical MT Tutorial Workbook; Philipp Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Philipp Koehn (2004) presentation on PHARAOH decoder
May 1	Machine translation continued (First hour or so is a guest lecture by Hal Daumé)	Reading from Hal distributed on Piazza.	None.	Hal Daumé III and Jagadeesh Jagarlamudi, Domain Adaptation for Machine Translation by Mining Unseen Words, ACL 2011.
May 8	Guest lecture: Doug Oard	J. Wang and D. Oard, Matching Meaning for Cross-Language Information Retrieval	Final exam project is due next Wednesday May 15 at 1pm.

Return to course home page