Implementing a Left-Corner Chart Parser

This is a double assignment, i.e. worth the same credit as two assignments. It is due Friday, December 1, 2006 at 5pm.

In this assignment, you will implement a left-corner chart parser. You'll do this by taking what you learned about chart parsing from Jurafsky and Martin (esp. their coverage of the Earley algorithm in Chapter 12), and integrating that with the parsing schema for left-corner chart parsers described by Robert Moore, "Improved Left-Corner Chart Parsing for Large Context-Free Grammars" [local link] (in Bunt, Carroll, and Satta, eds., New Developments in Parsing Technology, Kluwer, 2004, pp. 185-201; interestingly, it looks like the innovations in this paper are patented).

Steps in this Assignment

Required. Read the Moore paper up through the discussion of steps 4a and 5a on the 5th page. (Read the rest if you're interested, but here we're focused on the basic version rather than the improvements that are the main topic of the paper.)
Required. Answer: Which steps in the Earley algorithm do Moore's steps 1-3 correspond to?
Optional. Implement the Earley algorithm, and test it using the same grammar and input that Jurafsky and Martin use in their "A Complete Example" (Figure 12.14). That way you can make sure you're building the same chart they are in order to verify the correctness of your implementation. (This optional step can form the basis for significant partial credit. Turn in your code electronically, including a README describing how to run it, and in the hardcopy you turn in, include a printout of the chart contents corresponding to Figure 12.14.)
Required. Implement a recognizer that operates according to the parsing scheme specified by Moore in steps {1,2,3,4a,5a}. Turn in the code of the parser (submitted electronically), including a README describing how to run it, as well as a printout showing the contents of the chart for the sentence "book that flight" (following the format of Jurafsky and Martin's Figure 12.14).
Required. In your hardcopy, provide a brief, clear description of the difference (or differences) between Moore's left-corner parsing scheme and Earley's algorithm. Illustrate using one or more differences between chart states (edges) introduced by Earley's parser and those introduced by the left-corner chart parser, given the same input -- i.e. an edge or edges introduced by one parser but not the other.

Thoughts and Guidance

Especially for those with less programming experience. The most straightforward approach to this assignment would be to use the same basic control structure as the Earley parser, and adapt it to operationalize the parsing scheme Moore gives for left-corner chart parsing. So if I were doing the assignment, I'd probably start by implementing the Earley algorithm pretty much directly out of the book. In fact, I would probably start by implementing the main control structure with only stubs for predictor, completer, and scanner. Next I'd implement predictor alone, and make sure chart[0] gets all the correct states/edges added to it; then scanner, again checking to make sure new states/edges show up as desired; and then completer. This way I could compare with Figure 12.14 at every step to verify that what I'm doing is correct. Finally I'd modify the algorithm to follow the left-corner parsing schema, probably working out the chart states/edges for a specific example by hand and making sure that the implementation produces what I'd expect for that example.
An alternative approach, especially for those with more experience, would be to take the agenda-driven approach that Jurafsky and Martin describe in Section 12.4.3. This would enable useful next steps like adding a probability model to guide the parser through the search space. I'd recommend this mainly for students who have already implemented a chart parser such as the Earley algorithm.
Note that Moore expresses scanner in terms of terminal symbols rather than in terms of part-of-speech categories, unlike Jurafsky and Martin. Either way is ok to use in your implementation. Just make clear which you did in your hardcopy when you present the chart.
Note that you do not need to implement any of the efficiency improvements Moore discusses later in the paper. If you have any uncertainty about what you do or do not need to implement, ask.
Also note that the assignment is really to implement a recognizer rather than a parser, i.e. you do not need to do the extra bookkeeping needed to recover parse trees.
A final thought: this is the last homework assignment. So if you have not yet used your "free" 48-hour extension, now's a last opportunity to do so.

Extra Credit (10%)

Add the bookkeeping needed to read off all the parses from the chart at the end. In your hardcopy, include a brief, clear description of the necessary modifications. Run your parser on a globally ambiguous sentence, and include a printout of the parse trees in your hardcopy -- automatic plain text output, not something fancy created using a tree drawing program, though "prettyprinting" indentation would be nice. Here's an example of a prettyprinted parse tree (from Marcus et al., 1993):

( (S
   (NP Battle-tested industrial managers
       here)
    always
   (VP buck
       up
       (NP nervous newcomers)
       (PP with
           (NP the tale
               (PP of
                   (NP (NP the
                           (ADJP first
                                 (PP of
                                     (NP their countrymen)))
                           (S (NP *)
                              to
                              (VP visit
                                  (NP Mexico))))
                       ,
                       (NP (NP a boatload
                               (PP of
                                   (NP (NP warriors)
                                       (VP-1 blown
                                           ashore
                                           (ADVP (NP 375 years)
                                                 ago)))))
                           (VP-1 *pseudo-attach*))))))))
  .)

Please truncate the parse tree output at a maximum of a few printed pages to save paper -- I don't need to see every single tree, just a few!