WSD Using Supervised Learning

This assignment gives you the opportunity to tackle WSD as a supervised classification problem using WEKA. It should be a useful warm-up for the project. The WEKA software should run on all the platforms people are using. You can use any programming language you like for data preparation. Please feel free to use the class e-mail list to discuss problems you find, or even to team up as long as the contributions to the success of the assignment are equitable.

Extra Credit: Exploring Word Senses

5% extra credit apiece, for a total of 15% available.
  1. Without consulting a dictionary, enumerate word senses for the noun web. Use either a flat organization or a hierarchy, as you feel most appropriate.

  2. Consult any English dictionary other than WordNet, and consult WordNet. Compare and contrast the different characterizations of the senses for web. Include specific reference to how different resources (including your answer to the previous question) treat homonymy and polysemy.

  3. In Manning and Schuetze section Section 7.3.4, the constraint known as "one sense per discourse" (Yarowsky 1995) is stated as: "The sense of a target word is highly consistent within any given document". (You knew that already from having read the chapter, though, right??) Do exercise 7.9, which asks you to explore the validity of that constraint, particularly with regard to different types of ambiguity such as homonymy versus polysemy. Instead of constructing examples, I encourage you to do Web searching and find examples.