Schedule of Topics

This is the schedule of topics for Computational Linguistics II, Spring 2018.

In readings, "M&S" refers to Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing. The "other" column has optional links pointing either to material you should already know (but might want to review), or to related material you might be interested in. Make sure to do your reading before the class where it is listed!

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the online class discussions for "official" dates.

See CL Colloquium Talks for possible extra credit each week.

Class	Topic	Readings*	Assignments	Other
Jan 24	Course organization, semester plan; knowledge-driven and data-driven NLP	M&S Ch 1, 2.1.[1-9] (for review)	Assignment 1	Language Log (the linguistics blog); Hal Daumé's NLP blog (excellent blog, often technical machine learning stuff, but just as often more general interest, make sure to read the comment threads also because they're often excellent)
Jan 31	Lexical association measures and hypothesis testing	M&S Ch 5	Assignment 2	Dunning (1993) is a classic and valuable to read if you're trying to use mutual information or chi-squared and getting inflated values for low-frequency observations. Moore (2004) is a less widely cited but very valuable discussion about how to judge the significance of rare events. A really important paper by Ionnidis about problems with statistical hypothesis testing is Why Most Published Research Findings Are False; for a very readable discussion see Trouble at the Lab, The Economist, Oct 19, 2013 and the really great accompanying video. (Show that one to your friends and family!) For an interesting response, see Most Published Research Findings Are False--But a Little Replication Goes a Long Way". Kilgarriff (2005) is a fun and contrarian read regarding the use of hypothesis testing methodology specifically in language research. Named entities represent another form of lexical association. Named entity recognition is introduced in Jurafsky and Martin, Ch 22 and Ch 7 of the NLTK book.
Feb 7	Information theory	M&S Ch 2.2, M&S Ch 6 Piantadosi et al. (2011), Word lengths are optimized for efficient communication; Jaeger (2010), Redundancy and reduction: Speakers manage syntactic information density	Assignment 3	Cover and Thomas (1991) is a great, highly readable introduction to information theory. The first few chapters go into many concepts from this lecture with greater rigor but a lot of clarity. Maurits et al. (2010), Why are some word orders more common than others? A uniform information density account. See also the syllabus for a 2009 seminar taught by Dan Jurafsky and Michael Ramscar, Information-Theoretic Models of Language and Cognition, which looks as if it was awesome. Roger Levy provides a formal proof that uniform information density minimizes the "difficulty" of interpreting utterances. The proof assumes that, for any given word i in an utterance, the difficulty of processing it is some power k of its surprisal with k > 1.
Feb 14	HMMs and Expectation Maximization	Skim M&S Ch 9-10, Chapter 6 of Lin and Dyer. Read my EM recipe discussion.	Assignment 4	Recommended reading (and code to look at!): Dirk Hovy's Interactive tutorial on the Forward-Backward Expectation Maximization algorithm. Note that although his iPython notebook is designed to be interactive, you can also simply read it.
Feb 21	Guest lecture (Han-Chin Shing): Reduced-dimensionality representations for words	Efficient Estimation of Word Representations in Vector Space (with a focus on the the network architecture of CBOW and SkipGram); Distributed Representations of Words and Phrases and their Compositionally (with a focus on hierarchical softmax and negative sampling); Deep Learning with PyTorch: A 60 Minute Blitz (a really good tutorial for PyTorch)	Assignment 5
Feb 28	Reduced-dimensionality representations for documents: Gibbs sampling and topic models	Read Philip Resnik and Eric Hardisty, Gibbs Sampling for the Uninitiated; watch Jordan Boyd-Graber's 2013 CL1 topic modeling lecture (20 minutes, slides/notes available here.	Assignment 6	Recommended reading: Steyvers and T. Griffiths (2007), Latent Semantic Analysis: A Road to Meaning.
March 7	Context-free parsing	M&S Ch 11 (esp. pp. 381-388) and Ch 12 (esp. pp. 408-423, 448-455)	Assignment 7.	A really nice article introducing parsing as inference is Shieber et al., Principles and Implementation of Deductive Parsing (see Sections 1-3), with a significant advance by Joshua Goodman, Semiring Parsing.
Mar 14	Evaluation	Lin and Resnik, Evaluation of NLP Systems, Ch 11 of Alex Clark, Chris Fox and Shalom Lappin, eds., Blackwell Computational Linguistics and Natural Language Processing Handbook. Cohen and Howe, How Evaluation Guides AI Research		See Pereira, Formal grammar and information theory: together again? for discussion of probabilistic grammar and the argument made by Chomsky involving the sentences Colorless green ideas sleep furiously and Furiously sleep ideas green colourless.
Mar 21	Spring Break		Have fun!
Mar 28	Guest lecture (Joe Barrow): Sequence and seq2seq models	No required readings	Take-home midterm handed out, due 11:59pm Sunday April 1.
April 4	Deep learning and linguistic structure: a broader perspective	Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Read with an emphasis on Sections 1-4, and Section 5 (embeddings); also look over 10-11 (RNNs) and 12 (recursive NNs) and the Wikipedia page on autoencoders.	Project handed out. Project plans due in one week.	Recommended: Sections 1, 3 and 4 of Yoshua Bengio, Learning Deep Architectures for AI. Also recommended: the nice overview of representation learning in sections 1-4 of Bengio et al. Representation Learning: A Review and New Perspectives. Other useful background reading for broader perspective: Lillian Lee, Measures of Distributional Similarity, Hinrich Schuetze, Word Space, Mikolov et al., Linguistic Regularities in Continuous Space Word Representations.
April 11	Machine translation	Koehn, Statistical Machine Translation; Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation; M&S Ch 13 and Adam Lopez, Statistical Machine Translation, ACM Computing Surveys 40(3), Article 8, pages 149, August 2008; Wu et al. (2016), Google's neural machine translation system: Bridging the gap between human and machine translation.	Work on your project from here on out!
April 18	Text analysis in computational social science			Stephan Greene and Philip Resnik, More Than Words: Syntactic Packaging and Implicit Sentiment, NAACL 2009, Boulder, CO, May 31 - June 5, 2009. Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Deborah Cai, Jennifer Midberry, Yuanxin Wang, "Modeling topic control to detect influence in conversations using nonparametric topic models", Machine Learning, October 2013. Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik, "Lexical and Hierarchical Topic Regression", NIPS 2013. Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. Political Ideology Detection Using Recursive Neural Networks. Association for Computational Linguistics, 2014. Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristin Miler. ``Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress'', Association for Computational Linguistics Conference (ACL), Beijing, July, 2015.
April 25	Text analysis in computational social science, continued
May 2	Structured prediction	Reference material in: Noah Smith, Structured Prediction for Natural Language Processing; Ke Wu, Discriminative Sequence Labeling		Some useful historical background: Ratnaparkhi (1996), A Maximum Entropy Model for Part of Speech Tagging, or, if you want a little more detail, Ratnaparkhi (1997), A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Ratnaparkhi 1996 began the popularization of maxent in NLP. For relevant background on semirings see Joshua Goodman, Semiring Parsing. For an example of very interesting recent work in this area particularly from an automata-theoretic angle, see Chris Dyer, A Formal Model of Ambiguity and its Applications in Machine Translation as well as cdec. Also potentially of interest: Hanna Wallach's "Conditional Random Fields: An Introduction". Charles Sutton and Andrew McCallum's CRF tutorial in Foundations and Trends of Machine Learning.
May 9	Tentative: Natural language "understanding"	Bill MacCartney's excelleng tutorial on semantic parsing	Project is due 11:59pm ET May 18