Assignment 2

FSTs for morphology

This problem concerns Figure 3.17, the transducer for e-insertion in English plural nouns, in the new version of chapter 3. Read the description of the function of state q5, beginning with "State q5 is used to ensure that the e is always inserted whenever the environment is appropriate...".

Although state q5 is designed to make sure some strings are rejected, it is possible at least in principle for an input string to yield a path through the transducer that goes through q5 en route to an accepting state.

(a) Give an example of any input string -- whether or not it corresponds to a real word in English -- that would lead to a path through q5 on the way to an accepting state. Also show the full state sequence and the output string.

(b) Give or invent a reasonably plausible sounding English word that would be produced via a path that goe through q5 on the way to an accepting state. Show the input, the full state sequence gone through, and the output string. If it's an invented word, give a plausible meaning for it.

Dynamic Programming and Minimum Edit Distance

Note that the minimum edit distance algorithm in Figure 3.25 has a bug: the line
  distance[0,m] <- j
should read
  distance[0,j] <- j
Obviously there was a bug there somewhere, since as written you'd be overwriting the same cell m+1 times! Other than that I'm reasonably confident there are no other bugs, but if you find any please let me know.

For consistency with the example in the book, let's assume insertions and deletions have a cost of 1, and that substitutions have a cost of 2. (Except, obviously, substituting the same symbol for itself, which would of course have a cost of zero.) Show a matrix of the form in Figure 3.26 for the edits transforming source string #pepsi into target string #sprite, and report the minimum edit distance between those two strings.

You may do this by coding up the algorithm, or by working it through manually. If you code up the algorithm, I'd appreciate it if you'd turn in your code so I have a collection of implementations.