Implementing a Left-Corner Chart Parser
This is a double assignment, i.e. worth the same
credit as two assignments. It is due Friday, December 1, 2006 at 5pm.
In this assignment, you will implement a left-corner chart parser.
You'll do this by taking what you learned about chart parsing from
Jurafsky and Martin (esp. their coverage of the Earley algorithm in Chapter
12), and integrating that with the parsing schema for left-corner
chart parsers described by Robert Moore, "Improved
Left-Corner Chart Parsing for Large Context-Free Grammars"
[local link] (in
Bunt, Carroll, and Satta, eds., New Developments in Parsing
Technology, Kluwer, 2004, pp. 185-201; interestingly, it looks like
the innovations in this paper are patented).
Steps in this Assignment
- Required. Read the Moore paper up through the
discussion of steps 4a and 5a on the 5th page. (Read the rest if
you're interested, but here we're focused on the basic version
rather than the improvements that are the main topic of the paper.)
- Required. Answer: Which steps in the Earley
algorithm do Moore's steps 1-3 correspond to?
- Optional. Implement the Earley algorithm, and
test it using the same grammar and input that Jurafsky and Martin
use in their "A Complete Example" (Figure 12.14). That way you
can make sure you're building the same chart they are in order to
verify the correctness of your implementation. (This optional
step can form the basis for significant partial credit. Turn in
your code electronically, including a README describing how to
run it, and in the hardcopy you turn in, include a printout of
the chart contents corresponding to Figure 12.14.)
- Required. Implement a recognizer that operates
according to the parsing scheme specified by Moore in steps
{1,2,3,4a,5a}. Turn in the code of the parser (submitted
electronically), including a README describing how to run it, as
well as a printout showing the contents of the chart for the
sentence "book that flight" (following the format of Jurafsky and
Martin's Figure 12.14).
- Required. In your hardcopy, provide a brief,
clear description of the difference (or differences) between
Moore's left-corner parsing scheme and Earley's algorithm.
Illustrate using one or more differences between chart states
(edges) introduced by Earley's parser and those introduced by the
left-corner chart parser, given the same input -- i.e. an edge or
edges introduced by one parser but not the other.
Thoughts and Guidance
- Especially for those with less programming
experience. The most straightforward approach to this
assignment would be to use the same basic control structure as the
Earley parser, and adapt it to operationalize the parsing scheme
Moore gives for left-corner chart parsing. So if I were doing the
assignment, I'd probably start by implementing the Earley algorithm
pretty much directly out of the book. In fact, I would probably
start by implementing the main control structure with only stubs for
predictor, completer, and scanner. Next
I'd implement predictor alone, and make sure chart[0] gets
all the correct states/edges added to it; then scanner,
again checking to make sure new states/edges show up as desired; and
then completer. This way I could compare with Figure 12.14
at every step to verify that what I'm doing is correct. Finally I'd
modify the algorithm to follow the left-corner parsing schema,
probably working out the chart states/edges for a specific example
by hand and making sure that the implementation produces what I'd
expect for that example.
- An alternative approach, especially for those with more experience,
would be to take the agenda-driven approach that Jurafsky and Martin
describe in Section 12.4.3. This would enable useful next steps
like adding a probability model to guide the parser through the
search space. I'd recommend this mainly for students who have
already implemented a chart parser such as the Earley algorithm.
- Note that Moore expresses scanner in terms of terminal
symbols rather than in terms of part-of-speech categories, unlike
Jurafsky and Martin. Either way is ok to use in your implementation.
Just make clear which you did in your hardcopy when you present the chart.
- Note that you do not need to implement
any of the efficiency improvements Moore discusses later in the paper.
If you have any uncertainty about what you do or do not need to implement, ask.
- Also note that the assignment is really to implement a
recognizer rather than a parser, i.e. you do not need to
do the extra bookkeeping needed to recover parse trees.
- A final thought: this is the last homework assignment. So if you have
not yet used your "free" 48-hour extension, now's a last opportunity
to do so.
Extra Credit (10%)
Add the bookkeeping needed to read off all the parses from the chart
at the end. In your hardcopy, include a brief, clear description of
the necessary modifications. Run your parser on a globally ambiguous
sentence, and include a printout of the parse trees in your hardcopy
-- automatic plain text output, not something fancy created using a
tree drawing program, though "prettyprinting" indentation would be
nice. Here's an example of a prettyprinted parse tree (from Marcus
et al., 1993):
( (S
(NP Battle-tested industrial managers
here)
always
(VP buck
up
(NP nervous newcomers)
(PP with
(NP the tale
(PP of
(NP (NP the
(ADJP first
(PP of
(NP their countrymen)))
(S (NP *)
to
(VP visit
(NP Mexico))))
,
(NP (NP a boatload
(PP of
(NP (NP warriors)
(VP-1 blown
ashore
(ADVP (NP 375 years)
ago)))))
(VP-1 *pseudo-attach*))))))))
.)
Please truncate the parse tree output at a maximum of a few printed
pages to save paper -- I don't need to see every single tree, just a few!