MPQA Annotation Scheme Details

  • Annotation Sets and Types

    Annotations in a document are organized by GATE into sets. The MPQA set contains the annotation types in the MPQA schema. This is the annotation set you will be working with. There also may be some Default annotations, if you have used GATE to tokenize or otherwise process a document. Similarly, if the document was an html or xml document to begin with, there will be a set of Original markups.

    Below, the MPQA annotation types are listed with brief descriptions of their possible features.

    agent

    • id - A unique identifier assigned by the annotator to the first meaningful and descriptive reference to an agent. This id is case sensitive. When annotating a later mention of the same referent as an agent, there is no need to re-specify the id.

    • nested-source - Added to an agent annotation when the agent reference is the source of a private state/speech event. It is a list of agent ids beginning with the writer and ending with the id for the immediate agent being referenced. For instance, w,smith,miller would capture the idea that the writer quotes somebody called Smith who in turn quotes somebody called Miller.

    • agent-uncertain - Use when you are uncertain as to whether or not the agent is the correct source of a private state/speech event.

      Possible values: somewhat-uncertain, very-uncertain

    expressive-subjectivity

    • intensity - The strength of the expressive-subjective element.

      Possible values: low, medium, high, extreme

    • polarity - Attribute for marking the polarity of the expression, in context, according to the nested-source.

      Possible values: negative, positive, both, neutral, uncertain-negative, uncertain-positive, uncertain-both, uncertain-neutral

    • es-uncertain - Use when you are uncertain as to whether or not the word or phrase you are annotating is an expressive-subjective element.

      Possible values: somewhat-uncertain, very-uncertain

    • nested-source - Agent that is the source of the private state indirectly indicated by the expressive-subjective element. It is a list of agent ids beginning with the writer and ending with the id for the immediate agent that is the source.

    • nested-source-uncertain - Use when you are uncertain as to whether or not the agent is the correct source for the private state indirectly indicated by the expressive-subjective element.

      Possible values: somewhat-uncertain, very-uncertain

    direct-subjective

    • nested-source - Agent that is the source of the private state/speech event. It is a list of agent ids beginning with the writer and ending with the id for the immediate agent being referenced.

    • implicit - Add this feature when you annotate a (zero) span of text for an implicit speech or thought. For example, there may be quoted speech without a "said" where the speaker is implicit from the previous sentence. In this case, make the first quote or word at the beginning a direct-subjective and use this feature.

    • expression-intensity - Strength of the private state being expressed by the direct-subjective expression. To give you an idea, `said' is neutral, `thinks' is low, `criticized' or `fears' is medium, and something like `blasted' in the verbal sense is probably high.

      Possible values: neutral, low, medium, high, extreme.

    • polarity - Attribute for marking the polarity of the direct-subjective expression, in context.

      Possible values: negative, positive, both, neutral, uncertain-negative, uncertain-positive, uncertain-both, uncertain-neutral

    • intensity - The overall strength of the private state being expressed. Think of this as the union of the intensity of the expressions plus the strength of the private state being expressed by the expressive-subjective elements.

      Possible values: neutral, low, medium, high, extreme

    • attitude-link -This contains a list of the ids of all attitudes that are associated with the private state expressed by the direct-subjective

    • insubstantial - Use when the private state/speech event is not significant or not particular, based on the criteria for significant and particular in the annotation instructions. Type in all criteria that it fails to pass: c1 and/or c2 and/or c3.

    • subjective-uncertain- Use when you are uncertain, in context, whether the word or phrase ought not to be treated as an objective-speech event.

      Possible values: somewhat-uncertain, very-uncertain

    • annotation-uncertain - Use when you are uncertain if, in context, the word or phrase expresses a direct private state/speech event at all.

      Possible values: somewhat-uncertain, very-uncertain

    objective-speech-event

    • nested-source- Agent that is the source of the speech event. It is a list of agent ids beginning with the writer and ending with the id for the immediate agent being referenced.

    • implicit- Add this feature when you annotate a (zero) span of text for an implicit speech. For example, there may be quoted speech without a "said" where the speaker is implicit from the previous sentence. In this case, make the first quote or word at the beginning an objective-speech event and use this feature.

    • insubstantial- Use when the speech event is not significant or not particular, based on the criteria for significant and particular in the annotation instructions. Type in all criteria that it fails to pass: c1 and/or c2 and/or c3.

    • objective-uncertain- Set this feature if you are unsure whether the speaking event might not better be treated as a direct-subjective.

    • annotation-uncertain- Use if you are unsure that the word or phrase is really used to refer to a speech event at all.

    attitude

    • id- A unique id for this attitude (assigned by the annotator).

    • attitude-type- The specific attitude subtype that you recognize.

      The possibilities consist of the following set: agree-neg, agree-pos, arguing-neg, arguing-pos, intention-neg, intention-pos, other-attitude, sentiment-neg, sentiment-pos, speculation

    • attitude-uncertain- Use when you are uncertain about the presence of an attitude, or when you are not sure what the subtype of the attitude is.

    • intensity- This feature captures the strength of the attitude expressed.

    • target-link- A list of ids of the target spans that are associated with this attitude.

    • contrast- Set to yes if the attitude conveyed arises as part of a contrast between two situations. (Exploratory)

    • inferred- Set to yes if the attitude you`re marking is just an inference. (E.g. it would apply in the case of sentiment-neg towards the target Chavez in the oft-cited example "People are happy that Chavez fell") (Depreciated?)

    • repetition- Use if the attitude is conveyed through the use of repetition. (Exploratory)

    • sarcastic- Use if the attitude that is being conveyed is sarcastic. (Exploratory)

    target

    • id- a unique id for this target, assigned by the annotator. Note that even if this very same target reoccurs as a target elsewhere, give it a unique target id every time.

    • target-uncertain- Use if you are unsure that the selected word or phrase really is the target of the attitude to which you related it.

  • Rules for Annotating Agent Spans

    1. Every unique agent referred to in the text should be assigned ONLY ONE identifier. In other words, out of all agent spans in a document that refer to the U.S. human rights report, only one of them will have the feature, id.
    2. Note that this policy is different from that for targets: if the same entity occurs as a target multiple times in a text, it will be assigned a unique id on each occasion.
    3. Agent ids are case sensitive! If you give an agent an id=AbCdEf then you must type AbCdEf as the id for that agent every time you reference that agent in a nested-source, nested-target, etc.
    4. The id feature should be assigned to the first descriptive reference to the agent. Finding this reference is usually clear-cut but in some cases it's harder because the information that helps one to identify the agent referent is more distributed. Consider this example:
      1. So much for President Bush's effort to repair his legacy on global warming — at least when it comes to one German official with a flair for sloganeering.
      2. In a statement released today, Environment Minister Sigmar Gabriel described Mr. Bush's speech on Wednesday as ``disappointing.''

      In the second sentence, where the DSE ``described'' occurs, the relevant agent phrase is ``Environment Minister Sigmar Gabriel''. The question is whether one should consider the previous reference to ``one German official with a flair for sloganeering'' as an earlier descriptive reference. Here it seems acceptable to treat only the second mention where the person is identified with their office and name as fully descriptive. The mention in the first sentence would thus not have to be marked as an agent.

    5. When annotating a span of text that references an agent, label the entire noun phrase that is part of the reference. Thus, in the previous example, mark ``Environment Minister Sigmar Gabriel'' rather than only ``Sigmar Gabriel''.
  • A Stragety for Annotating in GATE

    Before you begin annotating, it will make your life easier if you start by selecting for viewing the agent, direct-subjective, expressive-subjectivity, objective-speech, and split annotations. It will also make your life easier if you sort the annotations by their starting byte, so that you can more easily keep track of your annotations.

    The basic recommendation is to proceed sentence by sentence and to perform the steps below for each sentence. Of course, they are meant only as a recommendation and you should do whatever works best for you.

    1. First, look at annotations that pertain to the writer of the document as a whole.
      1. Find and annotate all expressive-subjectivity for the writer. Edit the annotations, setting the nested-source, and judging the intensity and polarity of the expression.
      2. Use the writer's expressive-subjectivity annotations from the previous step to help determine if the writer is expressing a private state in the sentence. (Hint: Unless you marked only very weak, uncertain expressive-subjectivity annotation, the answer is, yes, the writer is expressing a private state.) If the writer is expressing a private state, change the objective-speech annotation provided for the sentence to a direct-subjective annotation. Annotate the (overall) intensity of the private state on the direct-subjective annotation. Because the direct-subjective annotation is implicit, you do not need to specify a value for the expression-intensity or polarity features.
      3. Apply attitude and target labels as per Theresa's instructions. Make sure that for each attitude you specify at least:
        • id
        • attitude-type
        • intensity
        • target-link
      4. Label the appropriate target span and give it an id. If you are unsure that the span really functions as the target of the attitude to which you are linking the target in question, then set the target-uncertain feature.
      5. After completing each attitude annotation, enter its id into the attitude-link field of the relevant direct-subjective annotation.
    2. Turn to the more deeply embedded nested-sources in the sentence. These will be typically mentioned overtly but might be implicit.
      1. Identify in the sentence all other direct mentions of private state and speech events that meet the criteria for annotation, i.e., identify all expressions that you will mark as direct-subjective or speech-event annotations.
      2. For every private state/speech event that you identified in the previous step, annotate:
        1. the span of text that evokes the private state/speech event
        2. the span of text that refers to the agent that is the source
        3. any spans of text that are expressive-subjectivity attributed to the source of the private state/speech event
      3. Edit the agent annotations that you just added, creating ids as needed, and setting the nested-source feature. Make sure that if the agent appears as a source for the first time, you also find the first descriptive reference to that agent in the text and mark it, giving it its initial id.
      4. Edit the expressive-subjectivity annotations that you just added, setting the nested-source, intensity, and polarity features.
      5. If any direct-subjective or objective-speech annotations are implicit, set the implicit feature to ``true'' for those annotations. A common situation in which an embedded source and their private state expression go unexpressed is when a sentence continues a quote as in the second sentence of this example:

        (1) ``That is a pessimistic assessment, but it may be realistic,'' he wrote in an email. (2) ``Look, for example, at the E.U. where, ... total E.U. emissions are now, once again, inching back up.''
      6. Specify the nested-source for all direct-subjective and objective-speech annotations.
      7. For direct-subjective annotations, mark the expression-intensity, polarity, and (overall) intensity features. (Omit expression-intensity and polarity if the DSE is implicit.)
      8. If the source of a direct-subjective annotation is expressing any kind of attitude, you need to start adding attitude labels and, where appropriate, target labels, and link them appropriately.
    3. Things to keep in mind:
      1. Usually, when you have expressive-subjectivity marked on a sentence you will also have an attitude.
      2. A single private state expression may have multiple attitude annotations associated with it.
      3. Many annotation types allow you to mark uncertainty. If you really are uncertain, use the appropriate fields.


annotation instructions by J. Ruppenhofer