Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2014.

In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.

Class	Topic	Readings*	Assignments	Other
Jan 29	Course administrivia, semester plan; some statistical NLP fundamentals	M&S Ch 1, 2.1.[1-9] (for review)	Assignment 1	Language Log (the linguistics blog), Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest)
Feb 5	Words and lexical association	M&S Ch 5	Assignment 2	Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations. Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events. A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a more recent and ery readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research. Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book.
Feb 12	Information theory	M&S Ch 2.2, M&S Ch 6 Optional: Piantadoso et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density	Assignment 3	Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into all of the concepts from this lecture with greater rigor but a lot of clarity. Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.
Feb 19	Maximum likelihood estimation and Expectation Maximization	Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read my EM recipe discussion.	Assignment 4
Feb 26	Bayesian inference and modeling Overview of final exam project	Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated. No need to work through all the equations in Section 2 in detail, but read carefully enough to understand the concepts. Read M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).	Do one of EC1, EC2, or EC3 from Assignment 4. (Worth 50% of a usual homework, and not due until 4:30pm Friday March 7)	For a very nice and brief summary of LDA, including a really clear explanation of the corresponding Gibbs sampler (with pseudocode!), see Section 5 of Gregor Heinrich, Parameter estimation for text analysis. I will touch on supervised topic models, particularly in the context of the project; I recommend reading Blei and McAuliffe, Supervised Topic Models (though note that we will not be talking about variational EM). Also relevant is Nguyen, Boyd-Graber, and Resnik, Lexical and Hierarchical Topic Regression. If you're interested in going back to the source for LDA, see Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation.
Mar 5	Supervised classification	M&S Ch 16 except 16.2.1; Hearst et al. 1998 Support Vector Machines (cleaner copy here)		I picked Hearst et al. (1998) as the SVM reading because it's the clearest, shortest possible introduction. There are many other good things to read at svms.org, including a "best tutorials" section, broken out by introductory, intermediate, and advanced, under Tutorials. Feel free to go with one of the other tutorials (the ones I've seen used most often are Burges 1998 and Smola et al. (1999))) instead of Hearst if you want a meatier introduction. Optional: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. Noah Smith's (2004) Log-Linear Models is a nice alternative introduction expressed in a vocabulary that is more consistent with current work.
Mar 12	Deep learning	Read sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Other sources we are likely to discuss include: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations.		Recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives, and the background on the skip-gram approach in word2vec found in Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality. Background on Mikolov et al.'s Linguistic Regularities paper is in Mikolov et al. Recurrent neural network based language model.
Mar 19	Spring Break		Have fun!
March 26	Evaluation in NLP	Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.	Take-home midterm
April 2	Structured prediction	Ke Wu, Discriminative Sequence Labeling Noah Smith's Structured prediction for NLP tutorial slides (ICML'09)		For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation.
Apr 9	More on structured prediction: CRF, structured perceptron, structural SVM	Noah Smith, Linguistic Structure Prediction, esp. Sections 3.5.2-3.7. (The book is available online for UMD and many other university IP addresses.)		Also of interest: Hanna Wallach's "Conditional Random Fields: An Introduction". Charles Sutton and Andrew McCallum's CRF tutorial in Foundations and Trends of Machine Learning. Ben Taskar et al.'s dissertation on large margin training of structured prediction models. Another paper by Ioannis Tsochantaridis et al. on the same topic with a different approach. A nice benchmark comparison of several algorithms for training CRFs and structured perceptrons.
April 16	Guest lecture: Doug Oard on information retrieval	Douglas W. Oard, Jerome White, Jaiul Paik, Rashmi Sankepally and Aren Jansen, "The FIRE 2013 Question Answering for the Spoken Web Task", Fifth Forum for Information Retrieval Evaluation, 8 pages, New Delhi, India, 2013.
Apr 23	Machine translation	Ch 13 and Adam Lopez, Statistical Machine Translation, In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.		Also potentially useful or of interest: Kevin Knight, A Statistical MT Tutorial Workbook; Philipp Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Philipp Koehn (2004) presentation on PHARAOH decoder
April 30	Machine translation continued
May 7	Projects discussion

Return to course home page