The parsing scheme for the Penn historical corpora, which is also
used by the Parsed
Corpus of Early English Correspondence (PCEEC), uses a limited tree
representation in the form of labelled parentheses. All open
parentheses have an associated label, either a phrase label (NP,
ADJP, etc.) or a word label (N, ADJ, etc.), representing
nodes in a tree. We use the terms 'word label' and 'POS
(part-of-speech) label' interchangeably. A word label is associated
with every word, but phrasal labels are not included in every case in
which a fully labelled tree would require them. Intermediate
projections in the sense of X' theory (N', ADJ', etc.) are not generally
included in our representations. By comparison to trees in current
syntactic theory, the trees in our corpora are therefore quite flat, and
they are not required to be binary-branching.
The partial representation of phrase structure in our corpora is not
intended to make a theoretical statement, but was adopted for practical
reasons. Certain phrases are generally omitted in the annotation scheme
because their boundaries are too difficult to define. The prime example
is VP. The problematic character of VP is particularly obvious in early
Middle English, where the order of the verb and its complements is in
flux (at least on the surface). But even in Present-Day English, the
attachment site of verbal adjuncts is systematically ambiguous between
low attachment to VP and high attachment at the clause level. Other
categories, such as DP, were omitted because the cost of including them
outweighs their usefulness. Intermediate projections are omitted for
both reasons. In no case should the lack of any particular phrase label
be taken to imply that earlier forms of English failed to include the
corresponding syntactic category. The trees in the corpora are simply
underspecified.
The examples in this section of the manual are constructed in Modern
English so as to be maximally accessible. The remainder of the manual
contains examples from the corpora. The examples are mostly from late
Middle English and Early Modern English; examples from early Middle
English are included where they are necessary to make a linguistic point.
Not all phrases are labelled with a dash tag, and more than one such
tag is possible (IP-INF-PRP = purpose infinitive,
NP-SBJ-RSP = resumptive subject NP, etc.).
Exception: IP can immediately dominate
the following word-level constituents:
Exception: Post-head modifiers project
structure beyond the word level, even when they consist of a single
word. Like pre-head modifiers, they are represented as sisters to the
head.
Exception 1: Certain heads, such as
determiners (D), modals (MD), or particles
(RP), never project a phrase. Verbs generally do not project a
VP, but see Verb phrase for exceptions.
Exception 2: Single-word pre-head
modifiers do not project a phrasal node if the phrasal node would share
the same category as the head.
The most common types of categorial mismatches between heads and
phrases are the following:
As indicated by the question marks in the above lists, it is not
always clear whether a particular word modifies an elided head or is
itself the head of the phrase. It is mainly for this reason that we do
not attempt to make a systematic distinction between the two cases.
The most common elision case is of NPs containing only a determiner and
an adjective (or sometimes only an adjective).
The complementizer position is always included; when not filled by
an overt complementizer, it contains 0 (zero).
General principles
As just mentioned, the structures in our corpora generally include
neither a VP nor intermediate projections like I'. As a result, IP
immediately dominates all verbs (to be understood in a broad sense,
including modals and auxiliaries) and sentence-level constituents. A
typical parse structure is the following:
(IP-MAT (NP-SBJ (NPR Mary))
(HVP has)
(BEN been)
(VAG meaning)
(IP-INF (TO to)
(VB go))
(PP (P for)
(NP (D a) (N week)))))
Dash tags (or extended tags)
(IP-MAT (NP-TMP (N Yesterday)) ← temporal NP
(NP-SBJ (NPR Mary)) ← subject
(VBD told)
(NP-OB2 (NPR Jane)) ← second object
(CP-THT (C that)
(IP-SUB (NP-SBJ (PRO she))
(VBD studied)
(NP-MSR (QP (ADVR too) (Q much))) ← measure NP
(PP (P during)
(NP (D the) (N+N weekend)))))) < no dash tag
(IP-MAT (NP-SBJ (NPR Mary))
(ADVP (ADV happily))
(VBD put)
(NP-OB1 (D the) (N book))
(ADVP-LOC (ADV there))
(ADVP-TMP (ADV+WARD afterward)))
(IP-MAT (NP-SBJ (NPR Mary))
(VBD put)
(NP-OB1 (D the) (N book))
(PP (P on)
(NP (D the) (N table)))
(PP (P on)
(NP (NPR Saturday))))
Structural principles
(NP (D the) (N girl))
(PP (P in)
(NP (D the) (N spring)))
(ADJP (ADV very) (ADJ big))
(NP (D a) (ADJ big) (N cat)) ← single-word case
(NP (D a)
(ADJP (ADV very) (ADJ big)) ← multi-word case
(N cat))
Postposed ELSE, ENOUGH are
treated slightly differently in
the PPCME2 and the later corpora.
(NP (ADJ various) (ADJ black) (NS cats)))
(NP (ADJ black) (NS cats)
(ADJP (ADJ galore)))
(NP (D a) (VAN forsaken) (N castle))
(NP (D an) (N castle)
(RRC (VAN forsaken)) ← RRC = reduced relative clause
(IP-MAT (NP-SBJ (NPR Mary))
(VBD saw)
(NP-OB1 (D the) (N man))
(PP (P with) ← high by default
(NP (D the) (N telescope))))
(NP (D the)
(N story)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-ACC *T*-1)
(NP-SBJ (PRO they))
(VBP tell)))
(PP (P about) ← high by default
(NP (PRO the)
(N king))))
(NP (D the)
(N story)
(RRC (BAG being)
(VAN told))
(PP (P about) ← high by default
(NP (PRO the)
(N king))))
Internal structure of phrases
The internal structure of all nonclausal phrases is fundamentally similar.
(NP (D the) (NS girls)
(PP (P on)
(NP (D the) (N beach))))
(NP (Q many) (ADJ happy) (NS girls) ← two single-word pre-head modifiers
(PP (P on)
(NP (D the) (VAN overcrowded) (N beach))))
(NP (ADJP (ADJ happy) (CONJ and) (VAN excited)) ← multi-word pre-head modifier
(NS girls))
(NP (Q many) ← single-word pre-head modifier
(ADJP (ADV very) (ADJ happy)) ← multi-word pre-head modifier
(NS girls)
(PP (P on)
(NP (D the) (VAN overcrowded) (N beach))))
(NP (QP (ADV very) (Q many)) ← multi-word pre-head modifier
(ADJP (ADV very) (ADJ happy)) ← multi-word pre-head modifier
(NS girls)
(PP (P on) ← post-head modifier
(NP (D the) (N beach)
(PP (P with)
(NP (D the) (ADJ big) (NS dunes))))))
(ADJP (ADV very) (ADJ happy)) ← single-word pre-head modifier
(PP (ADV especially) ← single-word pre-head modifier
(P on)
(NP (NPRS Saturdays)))
(PP (ADV right) ← single-word pre-head modifier
(P up)
(NP (D the) (N street)))
(ADVP (ADV very) (ADV slowly)) ← single-word pre-head modifier
EX, MAN, PRO, ?D, ?ONE, ?OTHER(S), ?PRO$, ?Q, (QR, etc.)
VAN, VAG, (HAN, HAG, etc.), SUCH, QR, ?ONE, ?OTHER
(IP-MAT (IP-MAT-1 (NP-SBJ (NPR Mary))
(VBD gave)
(NP-OB2 (NPR Jane))
(NP-OB1 (D a) (ADJ red) (N ribbon)))
(CONJP (CONJ and)
(IP-MAT=1 (NP-OB2 (NPR Lucy))
(NP-OB1 (D a) (ADJ blue))))) ← elided head noun
Internal structure of clauses
Clauses are labelled either CP or IP. CPs contain
either a complementizer or a wh- position (or both). IPs contain
neither. All IPs and CPs carry dash tags indicating their subtype, as
follows.
IP-ABS
absolute clause
IP-IMP
imperative
IP-INF
infinitive
IP-INF-ABS
absolute infinitive
IP-INF-ADT
adjunct infinitive
IP-INF-DEG
degree infinitive
IP-INF-PRP
purpose infinitive
IP-MAT
declarative matrix IP
IP-PPL
participial clause
IP-PPL-ABS
absolute participial clause
IP-SMC
small clause
IP-SUB
subordinate IP
CP-ADV
adverbial clause
CP-CAR
clause-adjoined relative
CP-CLF
IT cleft
CP-CMP
comparative
CP-DEG
degree complement
CP-EOP
empty-operator CP
CP-EXL
exclamative
CP-FRL
free relative
CP-QUE
question
CP-REL
relative clause
CP-THT
THAT clause
CP-TMC
TOUGH movement complement
Ordinary IPs
All IPs except subjectless
imperatives and subjectless
infinitives have a subject in our annotation scheme. If the subject
is not overt, an empty
subject is added. Clauses generally do not contain VP (but see
Verb phrase). As a rule, daughters of IP
are phrasal (but see
Internal structure of
phrases for exceptions).
(IP-MAT (CONJ But) ← sentential conjunction
(INTJ alas) ← single-word interjection
(, ,)
(NP-SBJ (PRO we))
(MD will) ← modal
(NEG not) ← negation
(Q all) ← floated quantifier
(VB end) ← verb
(RP up)) ← particle
(PP (P with)
(NP-OB1 (PRO$ our) (N favorite)))
(. .))
( (IP-MAT-SPE (' ')
(INTJ Yes) ← single-word interjection
(, ,)
(' ')
(IP-MAT-PRN (NP-SBJ (PRO he))
(VBD seyde)) ← verb
(, ,)
(' ')
(NP-SBJ (PRO I))
(MD shall) ← modal
(VB promyse) ← verb
(NP-OB2 (PRO you))
(IP-INF (TO to) ← auxiliary
(VB fullfylle) ← verb
(NP-OB1 (PRO$ youre) (N desyre)))
(. .)
(' '))
(ID CMMALORY,667.4880))
( (IP-MAT-SPE (CONJ for) ← conjunction
(NP-SBJ (PRO he)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-SBJ *T*-1)
(MD shall) ← modal
(VB pulle) ← verb
(NP-OB1 (PRO hit))
(RP oute)))) ← particle
(MD shall) ← modal
(DO do) ← verb
(NP-OB1 (PRO hit))
(PP (P with)
(NP (Q litill) (N myght)))
(. .)
(' '))
(ID CMMALORY,46.1512))
( (IP-MAT-SPE (CONJ and) ← conjunction
(ADVP (ADV ellis))
(NP-SBJ (PRO I))
(MD wolde) ← modal
(HV have) ← auxiliary
(BEN bene) ← verb
(ADJP (ADJ lothe)
(PP (P as)
(NP (Q ony) (N knyght)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-SBJ *T*-1)
(VBP lyvith)))))
(IP-INF (FOR for) (TO to) ← auxiliary material
(VB sle)
(NP-OB1 (D a) (N lady))))
(. .)
(' '))
(ID CMMALORY,51.1701))
Imperatives (IP-IMP)
Imperatives are labelled IP-IMP. Only overt subjects are
included in the annotation.
( (IP-IMP-SPE (CONJ but)
(VBI saye)
(CP-THT (C 0)
(IP-SUB (NP-SBJ (PRO ye))
(BEP are)
(VAN diseased)))
(. ,))
(ID CMMALORY,4.83))
( (IP-IMP (CONJ for)
(VBI witte)
(NP-SBJ (PRO ye)) ← overt subject
(ADVP (ADV wele))
(CP-THT (C +tat)
(IP-SUB (NP-SBJ (NPR god))
(MD may)
(VB se)
(NP-OB1 (CONJ ba+te) (N iuil) (CONJ and) (N gude))))
(. ;))
(ID CMBENRUL,12.418))
Non-wh CPs
THAT clauses (CP-THT), degree complements (CP-DEG),
and certain adverbial clauses (CP-ADV) have the following basic
structure:
(CP (C THAT/0)
(IP ...))
(NODE (PP (P so)
(CP-ADV (C that)
(IP-SUB (NP-MSR (NP (NUM thre) (NS dayes))
(CONJP (CONJ and)
(NP (NUM thre) (NS nyghtes))))
(NP-SBJ (PRO he))
(BED was)
(ADJP (ADJ specheles)))))
(ID CMMALORY,6.172))
(NODE (PP (P til)
(CP-ADV (C that)
(IP-SUB (NP-SBJ (PRO ye))
(VBP see)
(CP-THT (C 0) ← empty complementizer
(IP-SUB (NP-SBJ (PRO ye))
(VBP go)
(PP (P unto)
(NP (D the) (ADJR wers))))))))
(ID CMMALORY,13.393))
(NODE (PP (P whan)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (NP (D the) (N duke))
(CONJP (CONJ and)
(NP (PRO$ his) (N wyf))))
(BED were)
(VBN comyn)
(PP (P unto)
(NP (D the) (N kynge))))))
(ID CMMALORY,2.11))
(NODE (IP-SUB (NP-SBJ (PRO we))
(VBP departe)
(PP (P from)
(ADVP (ADV hens)))
(ADVP (ADV sodenly))
(, ,)
(CP-ADV (C that)
(IP-SUB (NP-SBJ (PRO we))
(MD maye)
(VB ryde)
(NP-MSR (Q all) (N nyghte))
(PP (P unto)
(NP (PRO$ oure) (ADJ owne) (N castell))))))
(ID CMMALORY,2.18))
(NODE (IP-SUB (NP-SBJ (PRO he))
(VBD understood)
(CP-THT (C that)
(IP-SUB (NP-SBJ (NPR syre) (NPR Ector))
(BED was)
(NEG not)
(NP-OB1 (PRO$ his) (N fader)))))
(ID CMMALORY,9.271))
(NODE (ADVP (ADVR so) (ADV harde)
(CP-DEG (C that)
(IP-SUB (NP-SBJ (N horse) (CONJ and) (N man))
(VBD felle)
(PP (P to)
(NP (D the) (N erthe))))))
(ID CMMALORY,17.538))
(NODE (ADVP (ADVR so) (ADV merveillously)
(CP-DEG (C that)
(IP-SUB (NP-OB1 (N doubte))
(NP-SBJ-1 (PRO it))
(BED was)
(IP-INF-1 (TO to)
(VB here)
(PP (P of)
(NP (D that) (N bataille))))
(PP (P for)
(NP (D the) (ADJ grete) (N blood) (N shedynge))))))
(ID CMMALORY,68.2325))
Wh- CPs
A number of clause types, listed below, contain both a wh- position and
a complementizer position. This is to allow for the case in which both
positions are filled. Empty wh- positions and empty complementizers are
both indicated by 0 (zero). The wh- operator is coindexed to a
trace of the same category. See Wh-
traces for details, particularly The position of traces.
( (CP-QUE (WADVP-1 (WADV how))
(C that)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (NPR sir) (NPR Gawayne))
(MD shall)
(VB revenge)
(NP-OB1 (NP-POS (PRO$ his) (N$ fadirs)
(NP-PRN *ICH*-2))
(N deth)
(CODE <em>)
(NP-PRN-2 (NPR kynge) (NPR Lot))
(CODE </em>))
(PP (P on)
(NP (NPR kynge) (NPR Pellynore)))))
(ID CMMALORY,61.2043))
( (CP-REL (WNP-1 (WPRO whyche))
(C that)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO thou))
(MD wolte)
(HV have)
(PP (P to)
(NP (PRO$ thy) (N peramour)))))
(ID CMMALORY,184.2551))
( (CP-FRL (WNP-1 (WPRO Who))
(C +tat)
(IP-SUB (NP-SBJ *T*-1)
(MD may)
(VB take)
(NP-OB1 (D +tys) (N vertu))))
(ID CMAELR3,26.10))
( (CP-ADV (WQ Whether)
(C that)
(IP-SUB (NP-SBJ (PRO I))
(VBP (VBP lyve) (CONJ other) (VBP dye))))
(ID CMMALORY,203.3290))
( (CP-FRL (WNP-1 (WD what) (N seruise))
(C +tat)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO +tu))
(VBP canst)))
(ID CMAELR3,40.417))
(NODE (CP-QUE (WPP-5 (P with)
(WNP (WD which) (N degre)
(PP (P of)
(NP (D the) (N zodiak)))))
(C that)
(IP-SUB (PP *T*-5)
(NP-SBJ (D the) (N mone))
(VBP arisith)
(PP (P in)
(NP (Q any) (N latitude)))))
(ID CMASTRO,663.C2.34))
(NODE (NP (D a) (ADJ new) (N batayle)
(CP-REL (WNP-1 (WPRO whych))
(C 0)
(IP-SUB (NP-SBJ *T*-1)
(BED was)
(ADJP (ADJ sore) (CONJ and) (ADJ harde)))))
(ID CMMALORY,26.832))
(NODE (NP-ADV (CP-FRL (WNP-1 (WPRO what) (ADV euere))
(C 0)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO heo))
(BEP be))))
(ID CMAELR3,26.4))
(NODE (NP (D the) (N castel)
(PP (P of)
(NP (NPR Terrabyl)))
(, ,)
(CP-REL (WNP-1 (D the) (WPRO whiche))
(C 0)
(IP-SUB (NP-SBJ *T*-1)
(HVD had)
(NP-OB1 (Q many) (NS yssues)
(CONJP (CONJ and)
(NX (NS posternes)))
(RP oute)))))
(ID CMMALORY,3.36))
(NODE (NP-OB1 (Q alle) (D the) (N cause)
(CP-QUE (WADVP-1 (WADV how))
(C 0)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (PRO it))
(BED was)
(PP (P by)
(NP (NPR$ Merlyns) (N counceil))))))
(ID CMMALORY,5.134))
(NODE (NP (CP-FRL (WNP-1 (WD what) (ADJ poure) (N man))
(C 0)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO ye))
(VBP mete)
(PP (P at)
(NP (D the) (N posterne) (N yate)
(PP (P of)
(NP (D the)
(N castel))))))))
(ID CMMALORY,6.149))
(NODE (NP (D the) (ADJS byggest) (N castell)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO he))
(HVP hath))))
(ID CMMALORY,2.32))
(NODE (NP-OB1 (D the) (NPR holy) (NPR lond)
(CP-REL (WNP-1 0)
(C +tat)
(IP-SUB (NP-SBJ (NS men))
(VBP callen)
(IP-SMC (NP-SBJ *T*-1)
(NP-OB1 (D the) (N lond)
(PP (PP (P of)
(NP (N promyssioun)))
(CONJP (CONJ or)
(PP (P of)
(NP (N beheste))))))))))(ID CMMANDEV,1.2))
(NODE (NP-MSR (D the) (ADJ+N meanwhyle)
(CP-REL (WADVP-1 0)
(C that)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (D thys) (N knyght))
(BED was)
(VAG makynge)
(IP-SMC (NP-SBJ (PRO hym))
(ADJP (ADJ redy)
(IP-INF (TO to)
(VB departe)))))))
(ID CMMALORY,48.1589))
(NODE (NP-SBJ (D Thys) (N swerde)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-SBJ (PRO I))
(BEP am)
(VAN gurte)
(PP (P withall)
(NP *T*-1)))))
(ID CMMALORY,45.1496))
(NODE (NP-OB1 (D the) (N despite)
(CP-REL (WNP-3 0)
(C 0)
(IP-SUB (NP-OB1 *T*-3)
(NP-SBJ (PRO ye))
(DOD dud)
(NP-TMP (D thys) (N day))
(PP (PP (P unto)
(NP (NPR kynge) (NPR Arthure)))
(CONJP (CONJ and)
(PP (P to)
(NP (PRO$ his) (N courte))))))))
(ID CMMALORY,51.1696))
(NODE (NP (Q al)
(CP-REL (WNP-1 0)
(C 0)
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PRO +tu))
(VBD suffredest))))
(ID CMAELR3,53.854))
(NODE (NP (D the) (N feith)
(CP-REL (WNP-2 0)
(C 0)
(IP-SUB (NP-OB1 *T*-2)
(NP-SBJ (PRO she))
(VBD ought)
(PP (P to)
(NP (PRO hym))))))
(ID CMMALORY,5.121))
(NODE (NP (D +te) (N blys)
(CP-REL (WNP-1 0)
(C 0)
(IP-SUB (NP-SBJ (PRO he))
(VBD boght)
(NP-OB1 (PRO vs))
(PP (P to)
(NP *T*-1)))))
(ID CMMIRK,5.112))
(PP (P if)
(CP-ADV (C 0) ← C, no inversion
(IP-SUB (NP-SBJ (PRO I))
(HVD had)
(VBN known))))
(CP-ADV (IP-SUB (HVD had) ← inversion, no C
(NP-SBJ (PRO I))
(VBN known)))
(IP-MAT (NP-SBJ (PRO))
(DOP do)
(NEG not)
(VB know)
(CP-QUE (WADVP-1 (WADV when))
(C 0) ← C, no inversion
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PRO they))
(MD will)
(VB come))))
(CP-QUE (WADVP-1 (WADV when)) ← inversion, no C
(IP-SUB (ADVP-TMP *T*-1)
(MD will)
(NP-SBJ (PRO they))
(VB come)))
Similar cases with an overt complementizer both before and after the fronted element are treated as CP recursion.
(NODE (IP-SUB (NP-SBJ (PRO hie))
(VBP make+d)
(CP-THT (NP-LFD (D +danne) (N man) ← left-dislocated NP in Spec(CP)
(CP-REL (WNP-1 0)
(C +de)
(IP-SUB (NP-OB2 *T*-1)
(NP-SBJ (NPR godd))
(NP-OB1 (PRO his))
(VBD to-sant))))
(C +tat)
(IP-SUB (NP-SBJ-RSP (PRO he))
(VBP +turwune+d)
(PP (P on)
(NP (PRO$ his) (N godnesse))))))
(ID CMVICES1,149.1875))
( (IP-MAT-SPE (CONJ &)
(NP-SBJ *con*)
(NP-LFD (PRO$ ti) (N wil))
(VBP iwur+de)
(NP-OB1-RSP (PRO hit))
(NP-VOC (ADJ deorwur+de) (NPR lauerd))
(CP-ADV (CP-ADV (C +tt)
(IP-SUB (NP-SBJ (PRO ich))
(PP (P +turh)
(NP (PRO$ +ti) (N streng+de)))
(MD mahe)
(VB stonden)
(PP (P wi+d)
(NP (PRO him)))))
(, .)
(CONJP (CONJ &)
(CP-ADV (NP-LFD (PRO$ his) (ADJ muchele) (N ouergart)) ← left-dislocated NP in Spec(CP)
(C +tt)
(IP-SUB (NP-SBJ (PRO ich))
(NP-OB1-RSP (PRO hit))
(MD mote)
(VB afeallen)))))
(. .))
(ID CMMARGA,70.251))
(NODE (IP-SUB (NP-SBJ (D tes) (ADJ unseli))
(NEG ne)
(MD +turue)
(NEG nawt)
(VB seggen)
(, .)
(CP-QUE-SPE (NP-LFD (PRO$ +ti) (NPR lauerd)
(CP-REL (WNP-1 0)
(C +tt)
(IP-SUB (IP-SUB (NP-SBJ (PRO tu))
(VBP leuest)
(PP (P on)
(NP *T*-1)))
(, .)
(CONJP (CONJ &)
(IP-SUB (NP-SBJ *T*-1)
(MD schulde)
(NP-OB1 (PRO$ +ti) (N scheld))
(BE beon))))))
(, .)
(WADVP-2 (WADV hwer))
(IP-SUB (ADVP-LOC *T*-2)
(BEP is)
(NP-SBJ-RSP (PRO he))
(ADVP-TMP (ADV nu+de)))))
(ID CMJULIA,122.464))
( (IP-IMP (VBI loke)
(ADVP-TMP (ADV +tenne))
(PP (ADV+P her-bi))
(, .)
(CP-QUE (NP-LFD (CP-FRL (WNP-1 (WPRO+ADV hwa-se))
(C 0)
(IP-SUB (NP-SBJ *T*-1)
(PP (P of)
(NP (PRO$ hire) (N mei+dhad)))
(, ;)
(VBP lihte+d)
(PP (P in-to)
(NP (N wedlac))))))
(, ;)
(WPP-2 (P bi)
(WNP (WQP (WADV hu) (Q monie))
(NS degrez)))
(C 0)
(IP-SUB (PP *T*-2)
(NP-SBJ-RSP (PRO ha))
(VBP falle+d)
(ADVP-DIR (RP+WARD dunewardes))))
(. .))
(ID CMHALI,144.244))
( (IP-MAT (CONJ and)
(NP-SBJ *con*)
(VBD tolde)
(NP-OB2 (PRO hym))
(CP-QUE (PP-1 (P whyle)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (PRO he))
(VBD tarryed)
(ADVP-LOC (ADV there)))))
(WADVP-2 (WADV how))
(C 0)
(IP-SUB (ADVP *T*-2)
(PP *ICH*-1)
(NP-SBJ (NPR Nero))
(BED was)
(VAN (VAN destroyed) (CONJ and) (VAN slayne))
(PP (P with)
(NP (Q all) (PRO$ his) (N oste)))))
(. .))
(ID CMMALORY,57.1905))
( (IP-MAT (CONJ &)
(NP-SBJ *con*)
(VBD (VBD +girnde) (CONJ &) (VBD walde))
(ADVP (ADV +georne))
(CP-THT (PP-2 (P +gef)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (NPR$ godes) (N wil))
(BED were))))
(, ;)
(C +tt)
(IP-SUB (PP *ICH*-2)
(NP-SBJ (PRO ha))
(MD moste)
(BE beon)
(NP-OB1 (ONE an)
(PP (P of)
(NP (D +te) (Q moni) (N+NS moder-bern)
(CP-REL (WNP-1 0)
(C +tt)
(IP-SUB (NP-SBJ *T*-1)
(NP-OB1 (QP (ADVR swa) (Q muchel)))
(VBD drohten)
(PP (P for)
(NP (NPR drihtin)))))))))))
(ID CMMARGA,56.22))
(CP-THT (C that)
(CP-THT (TOPIC-1 ...)
(C that)
(IP-SUB (TOPIC *ICH*-1)
...)))
The higher complementizer must be overt. Otherwise, the token is treated as fronting to Spec(CP), without recursive structure. As in the simple fronting case, unless the constituent sandwiched between the complementizers is left-dislocated (-LFD) and associated with a resumptive (-RSP) element, it is coindexed with an *ICH* trace.
( (IP-MAT (CONJ for)
(NP-SBJ (NPR sain) (NPR paul))
(VBP sais)
(CP-THT (C +tat)
(CP-THT (NP-LFD (PRO +tai)
(CP-REL (WNP-1 0)
(C +tat)
(IP-SUB (NP-SBJ *T*-1)
(DOP dos)
(NP-OB1 (ADJ wicke) (NS dedis)))))
(, ,)
(C +tat)
(IP-SUB (NP-SBJ-RSP (PRO tay))
(VBP giue)
(NP-OB1 (PRO+N +tam-selffe))
(PP (P til)
(NP (D +te) (NPR deuil))))))
(. ,))
(ID CMBENRUL,21.735))
(NODE (IP-SUB (NP-SBJ-2 (EX tare))
(BEP be)
(NP-2 (Q lytil) (N entirual))
(, ,)
(CP-ADV (C +tat)
(CP-ADV (NP-LFD (D ta)
(CP-REL (WNP-3 0)
(C +tat)
(IP-SUB (NP-SBJ *T*-3)
(MD sal)
(VB ga)
(PP (P til)
(NP (NS laburs))))))
(, ,)
(C +tat)
(IP-SUB (NP-SBJ-RSP (PRO tay))
(MD may)
(HV haue)
(NP-OB1 (D +te) (N morning))
(PP (P in)
(NP (D +te) (N begining)
(PP (P of)
(NP (D +te) (N lyth)))))
(PP (P to)
(NP (PRO$ +tair) (N labur)))))))
(ID CMBENRUL,15.546))
( (IP-MAT (ADVP-TMP (ADV +Ta))
(VBD be+tohte)
(NP-SBJ (PRO he))
(NP-OB2 (PRO him))
(CP-THT (C +tet)
(CP-THT (PP-1 (P gif)
(LB |)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (PRO he))
(MD mihte)
(BE ben)
(ADJP (N+ADJ rotfest))
(PP (P on)
(NP (NPR Engleland))))))
(C +tet)
(IP-SUB (PP *ICH*-1)
(NP-SBJ (PRO he))
(MD mihte)
(HV habben)
(NP-OB1 (Q eal) (PRO$ his) (N wille)))))
(. .))
(ID CMPETERB,49.224))
(NODE (IP-SUB (NP-SBJ (PRO +du))
(ADVP (ADV wel))
(BEP be)
(VAN iwarned)
(, ,)
(CP-THT (PP-2 (P +gif)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (NPR godd))
(NP-OB2 (PRO +de))
(VBP +gif+d)
(NP-OB1 (D +tese) (ADJ swete) (NS teares)))))
(, ,)
(C +tat)
(IP-SUB (PP *ICH*-2)
(NP-SBJ (Q non) (N win)
(PP (P in)
(NP (D +dare) (N world))))
(NEG+BEP nis)
(ADJP (ADVR swa) (ADJ swete)))))
(ID CMVICES1,149.1856))
(NODE (IP-INF (NP-SBJ (PRO us))
(VB considere)
(ALSO also)
(CP-QUE (WQ if)
(CP-QUE (NP-LFD (D the) (N conseillung)
(PP (P of)
(NP (PRO hem)
(CP-REL (WNP-1 0)
(C that)
(IP-SUB (NP-SBJ *T*-1)
(VBD conseilleden)
(NP-OB2 (PRO yow))
(IP-INF (TO to)
(VB taken)
(NP-OB1 (ADJ sodeyn) (N vengeaunce))))))))
(, ,)
(WQ wheither)
(C 0)
(IP-SUB (NP-SBJ-RSP (PRO it))
(VBP accorde)
(PP (P to)
(NP (N resoun)))))))
(ID CMCTMELI,228.C2.447))
No token break with conjoined intransitive verbs
He came ,_, saw ,_, and conquered ._.
( (IP-MAT (NP-SBJ (PRO He))
(VBD (VBD came) (, ,) (VBD saw) (, ,) (CONJ and) (VBD conquered))
(. .)))
Ordinary VP conjunction triggers token break
He sang ,_.
and danced the polka ._.
( (IP-MAT (NP-SBJ (PRO He))
(VBD sang)
(. ,)))
( (IP-MAT (CONJ and)
(NP-SBJ *con*)
(VBD danced)
(NP-OB1 (D the) (N polka))
(. .)))
Tricky case: Proper treatment depends on meaning (not on punctuation)
He came ,_, and left fifty years later ._. ← both events in the same year
( (IP-MAT (NP-SBJ (PRO He))
(VBD (VBD came) (, ,) (CONJ and) (VBD left))
(ADVP-TMP (NP-MSR (NUM 50) (NS years))
(ADVR later))
(. .)))
He came ,_.
and left fifty years later ._. ← both events separated by 50 years
( (IP-MAT (NP-SBJ (PRO He))
(VBD came)
(. ,)))
( (IP-MAT (CONJ and)
(NP-SBJ *con*)
(VBD left)
(ADVP-TMP (NP-MSR (NUM 50) (NS years))
(ADVR later))
(. .)))
No token break with first direct speech clause
He said ,_, " I conquered ._. "_"
( (IP-MAT (NP-SBJ (PRO He))
(VBD said)
(, ,)
(" ") ← default high attachment (contrary to sense)
(IP-MAT-SPE (NP-SBJ (PRO I))
(VBD conquered))
(" ") ← default high attachment (contrary to sense)
(. .)))
Further direct speech clauses become separate tokens
He said ,_, " I came ,_.
I saw ,_.
I conquered ._. "_"
( (IP-MAT (NP-SBJ (PRO He))
(VBD said)
(, ,)
(" ") ← default high attachment (contrary to sense)
(IP-MAT-SPE (NP-SBJ (PRO I))
(VBD came))
(. ,)))
( (IP-MAT-SPE (NP-SBJ (PRO I))
(VBD saw)
(. ,)))
( (IP-MAT-SPE (NP-SBJ (PRO I))
(VBD conquered)
(. .)
(" ")))
Clause-adjoined relatives (generally no token break, regardless of capitalization)
She baked brownies ,_, whereupon joy reigned unconfined ._. She baked brownies ,_, Whereupon joy reigned unconfined ._.
Ordinary combination of direct speech and clause-adjoined relative (no token break)
She said ,_, " I will bake brownies ,_, "_" Whereupon joy reigned unconfined ._.
Separate direct speech token forces clause-adjoined relative to become separate token
She said ,_, " I will fix the errors ._, and I will do so now ._. "_" Whereupon joy reigned unconfined ._.
Chapter and section headings
Section 102 ,/. Soldering tips ← separate token
CODE material
Where possible, material tagged with CODE (notably headings, page numbers, and editorial material) counts as a separate token.
Font change indications belong with their associated material; they are never separate tokens.
<text>/CODE <heading>/CODE Chapter 22 </heading>/CODE Here is a sentence. <P_234> ← separate token And here is another one. But lo, <P_235> ← part of surrounding token the preceding page number is not a separate token !
Diaries and letters
In diaries and letters, the place and date of writing each count as separate tokens.
( (NP-LOC (NPR London)))
( (NP-TMP (NPR June) (ADJ 21st) (NUM 1706)))
( (CP-QUE (NP-VOC (ADJS Dearest) (NPR Delilah))
(, ,)
(WADVP-1 (WADV How))
(IP-SUB (ADVP *T*-1)
(BEP are)
(NP-SBJ (PRO you)))
(? .)))
Interjections and vocatives
Where possible, interjections
and vocatives are treated as part
of a preceding or following sentence token.
Plays
The indication of the character speaking lines and the next line
spoken by the character count as a single token.
Falstaff ./, ← comma, not period
It is I .