Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2015.

In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.

See CL Colloquium Talks for possible extra credit each week.

Bayesian graphical modeling, Gibbs sampling

Class	Topic	Readings*	Assignments	Other
Jan 29	Course administrivia, semester plan; some statistical NLP fundamentals	M&S Ch 1, 2.1.[1-9] (for review)	Assignment 1 (See CL Colloquium Talks for possible extra credit)	Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent)
Feb 4	Words and lexical association	M&S Ch 5	Assignment 2 This assignment is due Friday, February 13 at 5pm. (See CL Colloquium Talks for possible extra credit)	Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations. Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events. A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are False--But a Little Replication Goes a Long Way". Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research. Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book.
Feb 11	Information theory	M&S Ch 2.2, M&S Ch 6 Optional: Piantadoso et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density	Assignment 3 (See CL Colloquium Talks for possible extra credit)	Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into all of the concepts from this lecture with greater rigor but a lot of clarity. Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome.
Feb 18	Maximum likelihood estimation and Expectation Maximization	Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read my EM recipe discussion.	Assignment 4 (See CL Colloquium Talks for possible extra credit)
Feb 25	More on EM and HMMs PCFG review	M&S Ch 11 (esp. pp. 381-388) and Ch 12 (esp. pp. 408-423, 448-455)	Assignment 5: Do one of EC1, EC2, or EC3 from Assignment 4. (Worth 50% of a usual homework) (See CL Colloquium Talks for possible extra credit)
March 4	Parsing, Generalizing CFG	M&S Ch 11 (esp. pp. 381-388) and Ch 12 (esp. pp. 408-423, 448-455)	(See CL Colloquium Talks for possible extra credit)	A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing, with a significant advance by Joshua Goodman, Semiring Parsing.
March 11	Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated. No need to work through all the equations in Section 2 in detail, but read carefully enough to understand the concepts. Read M. Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning and/or review the CL1 topic modeling lecture (notes, video).		(See CL Colloquium Talks for possible extra credit)	For a very nice and brief summary of LDA, including a really clear explanation of the corresponding Gibbs sampler (with pseudocode!), see Section 5 of Gregor Heinrich, Parameter estimation for text analysis. I may touch on supervised topic models; Blei and McAuliffe, Supervised Topic Models (though note that we will not be talking about variational EM). Also relevant is Nguyen, Boyd-Graber, and Resnik, Lexical and Hierarchical Topic Regression. If you're interested in going back to the source for LDA, see Blei, Ng, and Jordan (2003), Latent Dirichlet Allocation.
Mar 18	Spring Break		Have fun!
Mar 25	Supervised classification and evaluation	Supervised classification: M&S Ch 16 except 16.2.1; Hearst et al. 1998 Support Vector Machines (cleaner copy here) Evaluation: Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook.	Take-home midterm (See CL Colloquium Talks for possible extra credit)	Hearst et al. (1998) is a nice SVM reading because it's the clearest, shortest possible introduction. There are many other good things to read at svms.org, including a "best tutorials" section, broken out by introductory, intermediate, and advanced, under Tutorials. Feel free to go with one of the other tutorials (the ones I've seen used most often are Burges 1998 and Smola et al. (1999))) instead of Hearst if you want a meatier introduction.
April 1	Deep learning Final project introduction	Read sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. (May add a reading on RBMs.)	Final project handed out. (See CL Colloquium Talks for possible extra credit)	Recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives, and the background on the skip-gram approach in word2vec found in Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality. Background on Mikolov et al.'s Linguistic Regularities paper is in Mikolov et al. Recurrent neural network based language model. Other useful background reading: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations.
Apr 8	Guest lecture: Raul Guerra on Data Stream Mining in NLP	Read Svitlana Volkova, "Data Stream Mining: A Review of Learning Methods and Frameworks"	Assignment 6 (worth 50% of an assignment): Turn in a brief (1-2 page), well structured, clearly written summary of today's lecture. (See CL Colloquium Talks for possible extra credit)	Amit Goyal, Streaming and Sketch Algorithms for Large Data NLP.
April 15	Structured prediction	Noah Smith, Linguistic Structure Prediction, esp. Sections 3.5.2-3.7. (The book is available online for UMD and many other university IP addresses.) See also Noah Smith's Structured prediction for NLP tutorial slides (ICML'09).	(See CL Colloquium Talks for possible extra credit)	Ke Wu, Discriminative Sequence Labeling. Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. Noah Smith's (2004) Log-Linear Models is a nice alternative introduction expressed in a vocabulary that is more consistent with current work. For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation. Also of interest: Hanna Wallach's "Conditional Random Fields: An Introduction". Charles Sutton and Andrew McCallum's CRF tutorial in Foundations and Trends of Machine Learning. Ben Taskar et al.'s dissertation on large margin training of structured prediction models. Another paper by Ioannis Tsochantaridis et al. on the same topic with a different approach. A nice benchmark comparison of several algorithms for training CRFs and structured perceptrons.
Apr 22	More on structure prediction	See readings from last week	(See CL Colloquium Talks for possible extra credit)
Apr 29	Machine translation	M&S Ch 13 and Adam Lopez, Statistical Machine Translation, In ACM Computing Surveys 40(3), Article 8, pages 149, August 2008.	(See CL Colloquium Talks for possible extra credit)	Also potentially useful or of interest: Kevin Knight, A Statistical MT Tutorial Workbook; Philipp Koehn, PHARAOH: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Philipp Koehn (2004) presentation on PHARAOH decoder
May 6	TBD, most likely more on MT	Readings TBD	(See CL Colloquium Talks for possible extra credit)