Assignment 2c for Ling 645/CMSC 723, Fall 1997

A Finite-state Automaton for the English Auxiliary System

Due Wednesday, 17 September 1997.

INTRODUCTION

English sentences typically contain a sequence of auxiliary verbs
followed by a main verb, as in

  I     can                see the house
  I     will have          seen the house
  I     was                watching the movie
  I     should have been   watching the movie

There is a rich structure in these auxiliary sequences, and in this
assignment you will model it using a NONDETERMINISTIC finite-state
automaton.  This will not involve doing anything on the computer (for
now!), just drawing out a finite-state machine that accepts valid
sequences of auxiliary verbs and rejects invalid ones.

Recall that unlike a deterministic FSA, in a NONDETERMINISTIC FSA you
may have

  (1) epsilon transitions, which allow the machine to go
      from one state to another without absorbing any input, and

  (2) multiple arcs out of a state having the same input symbol on
  them.


THE PROBLEM

Here are the auxiliary words in English that your machine must deal
with (and therefore must be in its alphabet):

   am are be been can could had has have is may might must 
   shall should was were will would

To keep things simple, we will assume just a single verb, "to write".
So the symbols in your machine's alphabet must also include the forms
of that verb:

   write writes writing wrote written

If I've counted right, that's an alphabet of 24 symbols, total.

Starting from your machine's initial state, it should accept any
sequence of symbols that is a valid sequence of auxiliaries in English
concluded by an appropriate form of the verb "to write", as could
appear in a an active (not passive) sentence.  Some examples:

  is writing                (which could appear in "John is writing")
  are writing               (which could appear in "They are writing")
  was writing               ...etc.
  has written
  might write
  has been writing
  could have written
  should have been writing

The machine should of course reject invalid sequences:

  *have might write
  *is can write
  *is having wrote

Note that although you don't have to handle subjects of the sentence
in your machine, the auxiliaries+verb sequences you accept could be part
of a sentence with a singular or plural subject, in the first, second,
or third person.  I'm leaving the subject out so you don't have to
worry about agreement, e.g. starting from its initial state your
machine should accept both "has written" and "have written", even
though only the second of those is valid if the subject of the
sentence is "They".

I did not give you ALL the valid sequences of auxiliaries+verb in
English, above: part of your job is to figure out that information
either by introspection (if you are a native speaker) or by asking
native speakers (if you are not).  (Recall that you are welcome to
work together with other students, although each person should turn in
a complete assignment separately, and everyone is responsible for
EVERYTHING when you take the exams.)



WHAT YOU SHOULD TURN IN

Turn in, on paper, a finite-state automaton as described above, using
the graphical notation we saw in class.

Also turn in an EXHAUSTIVE list of the sequences that are accepted,
and a REPRESENTATIVE list of the sequences that are rejected by the
machine.  At a minimum, include the sequences given above.  Your list
can abbreviate if you want, e.g.

    (has,had) been writing
    (can,might,should,will,would) write
    etc.



Return to the course home page.