The POS annotation guidelines for the training and testing data for the
parser experiments differ in various details from the guidelines for the
published historical corpora.
"Split" tags of the form ADJ21 ADJ22 have been eliminated.
Compound tags like ADJ+N for gentleman have been largely
eliminated. The compound tags that remain are as follows; if
necessary, they could be replaced as indicated.
Tag
Example (with syntactic projecction)
Possible replacement
N
ADV@P
(PP (ADV@P therein))
(ADVP (ADV therein))
393
P@N
(PP (P@N o'clock))
(PP (P o') (NP (N clock))))
105
P@PRO
(PP (P@PRO for't))
(PP (P for) (NP (PRO 't)))
31
There is a strict division between terminals and non-terminals (for
example, (VBD_RECURSIVE (VBD ...) (CONJ_WORD ...) (VBD ...)).
In general, POS tags don't contain a hyphen. Any remaining hyphens
can be deleted by the parser, as they correspond to the hyphens on
nonterminals, representing information at the level of phrase structure
(movement, resumptive possessive pronouns, and the like).
Punctuation is tagged according to function and syntactic context.
For example, a comma might be tagged as CONJ_WORD, CONJ_NX, CONJ_NP, etc.
Open parens and brackets are tagged as OPEN_PAREN, and their closing
counterparts as CLOSE_PAREN.
NUM has been split according to syntactic context:
ADJ_NUM (nominal modifier)
NUM (elsewhere = head of NP)
Q has been split according to syntatic context:
PDT (predeterminer)
ADJ_Q (nominal modifier)
Q (elsewhere = head of NP or QP)
In wh- constructions (relative clauses and related
constructions) that is tagged as C_WH rather than as C.
Subordinating conjunctions are tagged as CONJS rather than P.