Predicting English: An Intro to Natural Language Processing

Note: A more detailed description of this lab is available for your reference at http://nlp.cs.jhu.edu/~dasmith/ws03lab.

File Formats

Tools

Getting Started

  1. Log in to one of the workshop machines; these should be running SunOS 5.7 or 5.8.
  2. Add the bin directory to your path so that you can access the tools:
    If you use tcsh: setenv PATH /export/ws03_mt_2/scratch/ws03lab/bin:$PATH
    If you use bash: export PATH=/export/ws03_mt_2/scratch/ws03lab/bin:$PATH
    Please use tcsh or bash for this exercise, not csh. We're pretty sure that the default on your new account is tcsh.
  3. To make it so that other people on your team can edit files you create, run
    umask 000
  4. Wave your hands and flag down Noah to give your team a secretly-named directory if he hasn't already. This is where you should do all your work on this lab. Go to your team directory under /export/ws03_mt_2/scratch/ws03lab/. Someone should copy the files you need to get started:
    cp ../start/* ./
    The script test.csh will do this for you.
  5. Create a file called members that contains the (complete) email addresses of everyone on your team, one per line. This is so that we can give you sentence for grammaticality judgements later in the lab.
  6. Use an editor such as Emacs to edit and create grammar (.gr) files. A good place to start is by modifying Top.gr, where the weights for S1 and S2, and the initial S1, are. We recommend that you not change the file Vocab.gr, but rather create new Tag -> word rules in addition to the ones we've given you, in a different file. You will want to use parse to see how well you can parse training data from the other teams, and you will want to use randsent to see if the sentences you are generating look like English.
  7. When time is up, you will be asked to give grammaticality judgements on some sentences generated by various teams' grammars (you won't know where each sentence came from -- it could be yours -- so be honest!). When we quit, be sure you cat together all your grammar files into one file called GRAMMAR.gr so we can evaluate your model. Don't forget to include files like S2.gr and Vocab.gr when you do this! Then, to make sure everything will run in the evaluation, run:
    randsent -n 10 GRAMMAR.gr
    parse examples.sen GRAMMAR.gr
  8. We will spend the final part of the lab in discussion while the cross-entropy scores are computed.

Tips

Important Rules

Warnings


Noah Smith and Jason Eisner; created for the 2003 NAACL Summer Workshop on Language Engineering
Last modified: Mon Jun 30 10:48:37 EDT 2003
http://nlp.cs.jhu.edu/~dasmith/ws03lab/doit.html