Do Exercise 6.1 in Jurafsky and Martin. That is, show the
formula for the maximum likelihood estimate of probability in a
trigram model (Should take 5-10 minutes)
A maximum likelihood estimate (MLE) of probability is computed by
taking the raw count and dividing by the total, so for a general
N-gram model the relevant maximum likelihood estimates are:
(1) P(w[n-N+1],...,w[n-1],n) = Count(w[n-N+1],...,w[n-1],w[n])/Total
(2) P(w[n-N+1],...,w[n-1]) = Count(w[n-N+1],...,w[n-1])/Total
What we want is the conditional probability
P(w[n-N+1],...,w[n-1],n)
(3) P(w[n] | w[n-N+1],...,w[n-1]) = -------------------------
P(w[n-N+1],...,w[n-1])
Substituting (1) and (2) into the numerator and denominator of (3),
the Total in the numerator and denominator cancel out, so we get the
general formula for the MLE of probability in an N-gram model:
Count(w[n-N+1],...,w[n-1],w[n])
(4) P(w[n] | w[n-N+1],...,w[n-1]) = ---------------------------------
Count(w[n-N+1],...,w[n-1])
Note that Jurafsky and Martin give you (4), so you
didn't actually need to go through steps (1)-(3) by yourself! They're
included here so you can fully understand what's going on.
Given (4), you need only show how this general formula applies for a
trigram model, where N = 3:
Count(w[n-2],w[n-1],w[n])
P(w[n] | w[n-2],...,w[n-1]) = ---------------------------------
Count(w[n-2],w[n-1])
Note that throughout I have used the notation w[i] to indicate that w
has subscript i.