Information Theory Questions


Please turn in a hardcopy for this assignment. I don't need to see your code. (Although if it's really clean, please e-mail it to me, or send me a URL pointing to it, so I can use it as a solution!) Notice that the point values for the problems add up to 85; grades will be normalized to be out of 100%.

As an aside for the mathematically inclined, here is a pointer to several derivations of the definition of entropy based on axioms about properties that a measure of uncertainty should have. See also a concise summary of properties of entropy. The nice presentation/proof I mentioned in class is in A.I. Khinchin, Mathematical Foundations of Information Theory, Dover, New York, 1957.

A note about notation. If X is a random variable whose distribution is (p1,p2,...,pk) then we usually write H(X) for H(p1,p2,...,pk). I will occasionally write H(p) instead of H(X), where p abbreviates p1,p2,...,pk.


  1. [5pts] (Manning and Schuetze exercise 2.12) Show that the KL divergence is not symmetric by finding an example of two distributions p and q for which D(p||q) is not equal to D(q||p). Give the two distributions and show the explicit computation of the KL divergences. Simple toy distributions (e.g. over {heads,tails}) are fine.

  2. [15pts] Prove that cross entropy H(p,q) = H(p) + D(p||q).

  3. [30pts] (Based on Manning and Schuetze exercise 2.10) Compute three probability distributions over unigrams: one for Conan Doyle's Adventures of Sherlock Holmes (Adventures), one for Conan Doyle's A Study in Scarlet (Study), and one for the book of Genesis (Genesis) -- all available via the assignment on collocations. Treating the Adventures distribution as p, compute the KL divergence against q (Study) and q' (Genesis).

    Note: you'll need to do something about zero counts. Smooth using Laplace ("add one", Manning and Schuetze Section 6.2.2) to avoid zero. You can assume that the total vocabulary is the union of all the unigrams in the three texts.

    Hint for less experienced programmers (or more experienced programmers who don't feel like doing it themselves): you can use one of the files created when you ran Stats in the Collocations assignment to get the unigram frequencies.

    Questions to answer on paper: