How to Annotate in GATE

In GATE, load the document: sample-documents/examples-untagged.xml

While it's possible to load a plain text file or a web document URL and to just begin annotating, some minimal preprocessing can make the annotation process a bit easier (see XXXX for how we preprocess a document in GATE).

  • Some Default Annotations

    During preprocessing, a number of annotations are added to the document.

    • Open the Annotation Sets and Annotation List frames.

    • Select the 'agent' annotations for viewing. Two zero-length agent annotations, for agents with id=implicit and id=w (writer) will be listed. Because they are zero-length annotations, they will not be visible in the document text.

    • Select the 'objective-speech' and 'sentence' annotations for viewing. At the beginning of each sentence, a zero-length, implicit 'objective-speech' annotation has been added for the writer. During annotation, the 'objective-speech' annotation can easily be changed to a 'direct-subjective' annotation if the annotator feels that is the correct annotation.

    • There are also 'split' annotations added by GATE's sentence splitter.

    Note that the 'direct-subjective' or 'expressive-subjectivity' annotation types are not yet listed in the Annotation Sets frame. This is because there are currently no annotations of these types in the document. When the first direct-subjective or expressive-subjectivity annotation is added to the document, the annotation type will then be added to the Annotation Sets frame.

  • Creating an Annotation

    1. First, in the Annotation Sets frame click on MPQA. This will select the MPQA annotation set and ensure than any annotations you create will be properly listed under this set.

    2. In the document text, highlight the span of text that you want to annotate. Make sure that you do NOT accidently include any spaces at the beginning or end of the span of text you are annotating.

      EXAMPLE: "China" in the sentence,
      "China said on Tuesday a U.S. State Department report that accused Beijing of suppressing religious freedom was full of lies and urged Washington not to hold double standard in the war on terrorism."

    3. The Annotation Editor Dialog window will pop up. (It may take a few seconds.) In that window, select the annotation type. In our example, we want to select 'agent'. The Annotation Editor Dialog window will change to show available features (attributes) for the 'agent' type. Also, the new annotation will be listed in the Annotation List frame, and the color of the highlighted span will change to the 'agent' color and begin flashing.

    4. Start filling in the attributes for the annotation frame you're working on. For our example, type in "china" in the id field and "w,china" in the nested-source field.

  • Saving a Document

    Save your document reasonably often as you annotate. GATE has no auto-save feature!

    1. Right click on the document name under Language Resources
      right click on the appropriate tab in the list of open documents at the top of the middle frame.
    2. Select: Save As XML.
    3. Type in the file name that you want to give it. Example: hr7-taw.xml. Make sure that it was saved with an .xml extension.
    4. When you are completely done with your annotations and have saved the document for the last time, you may want to try closing the document in GATE (right-click -> Close), and reopening it to check that all of you annotations were saved properly.
    5. Please rename your completed, final annotated document. Unless instructed otherwise, use the original document name, extended by your login (or initials) and the word "final":

    Finally, click on the Messages tab to see if there was an error saving the file.
  • Other Recommendations

    In the context of the Pitt group, one thing to be careful about when performing annotations in GATE is that you might click into the text area and introduce characters or white space.

    This causes problems when other people are annotating the same documents in parallel and one wants to  perform an automatic comparison of the two annotations. It also could cause a problem if the additional material gets introduced after the gate_default file that stores the tokenization for the xml document was created and is not updated.

    The upshot is: be really, really careful not to modify the original text!

annotation instructions by J. Ruppenhofer