Introduction to syntactic annotation


The parsing scheme for the Penn historical corpora, which is also used by the Parsed Corpus of Early English Correspondence (PCEEC), uses a limited tree representation in the form of labelled parentheses. All open parentheses have an associated label, either a phrase label (NP, ADJP, etc.) or a word label (N, ADJ, etc.), representing nodes in a tree. We use the terms 'word label' and 'POS (part-of-speech) label' interchangeably. A word label is associated with every word, but phrasal labels are not included in every case in which a fully labelled tree would require them. Intermediate projections in the sense of X' theory (N', ADJ', etc.) are not generally included in our representations. By comparison to trees in current syntactic theory, the trees in our corpora are therefore quite flat, and they are not required to be binary-branching.

The partial representation of phrase structure in our corpora is not intended to make a theoretical statement, but was adopted for practical reasons. Certain phrases are generally omitted in the annotation scheme because their boundaries are too difficult to define. The prime example is VP. The problematic character of VP is particularly obvious in early Middle English, where the order of the verb and its complements is in flux (at least on the surface). But even in Present-Day English, the attachment site of verbal adjuncts is systematically ambiguous between low attachment to VP and high attachment at the clause level. Other categories, such as DP, were omitted because the cost of including them outweighs their usefulness. Intermediate projections are omitted for both reasons. In no case should the lack of any particular phrase label be taken to imply that earlier forms of English failed to include the corresponding syntactic category. The trees in the corpora are simply underspecified.

The examples in this section of the manual are constructed in Modern English so as to be maximally accessible. The remainder of the manual contains examples from the corpora. The examples are mostly from late Middle English and Early Modern English; examples from early Middle English are included where they are necessary to make a linguistic point.

General principles

As just mentioned, the structures in our corpora generally include neither a VP nor intermediate projections like I'. As a result, IP immediately dominates all verbs (to be understood in a broad sense, including modals and auxiliaries) and sentence-level constituents. A typical parse structure is the following:
(IP-MAT (NP-SBJ (NPR Mary))
        (HVP has)
        (BEN been)
        (VAG meaning)
        (IP-INF (TO to)
                (VB go))
        (PP (P for)
            (NP (D a) (N week)))))

Dash tags (or extended tags)

Structural principles

Internal structure of phrases

The internal structure of all nonclausal phrases is fundamentally similar.

Internal structure of clauses

Clauses are labelled either CP or IP. CPs contain either a complementizer or a wh- position (or both). IPs contain neither. All IPs and CPs carry dash tags indicating their subtype, as follows.

The distinction between direct questions (CP-QUE-MAT) and indirect questions (CP-QUE-SUB) is recent. It is consistently implemented in this section of the documentation, but not yet in all others (where indirect and direct questions are both annotated as CP-QUE, with the distinction being reflected in the presence versus absence of a C node).

IP-ABS absolute clause
IP-IMP imperative
IP-INF infinitive
IP-INF-ABS absolute infinitive
IP-INF-ADT adjunct infinitive
IP-INF-DEG degree infinitive
IP-INF-PRP purpose infinitive
IP-MAT declarative matrix IP
IP-PPL participial clause
IP-PPL-ABS absolute participial clause
IP-SMC small clause
IP-SUB subordinate IP

CP-ADV adverbial clause
CP-CAR clause-adjoined relative
CP-CLF IT cleft
CP-CMP comparative
CP-DEG degree complement
CP-EOP empty-operator CP
CP-EXL exclamative
CP-FRL free relative
CP-QUE-MAT direct question
CP-QUE-SUB indirect question
CP-REL relative clause
CP-THT THAT clause
CP-TMC TOUGH movement complement

Ordinary IPs

All IPs except
subjectless imperatives and subjectless infinitives have a subject in our annotation scheme. If the subject is not overt, an empty subject is added. Clauses generally do not contain VP (but see Verb phrase). As a rule, daughters of IP are phrasal (but see Internal structure of phrases for exceptions).
(IP-MAT (CONJ But)			← sentential conjunction
	(INTJ alas)			← single-word interjection
	(, ,)
        (NP-SBJ (PRO we))
	(MD will)			← modal
	(NEG not)			← negation
	(Q all)				← floated quantifier
        (VB end)			← verb
        (RP up))			← particle
	(PP (P with)
            (NP-OB1 (PRO$ our) (N favorite)))
        (. .))

( (IP-MAT-SPE (' ')
              (INTJ Yes)		← single-word interjection
              (, ,)
              (' ')
              (IP-MAT-PRN (NP-SBJ (PRO he))
                          (VBD seyde))	← verb
              (, ,)
              (' ')
              (NP-SBJ (PRO I))
              (MD shall)			← modal
              (VB promyse)			← verb
              (NP-OB2 (PRO you))
              (IP-INF (TO to)			← auxiliary
                      (VB fullfylle)		← verb
                      (NP-OB1 (PRO$ youre) (N desyre)))
              (. .)
              (' '))
  (ID CMMALORY,667.4880))

( (IP-MAT-SPE (CONJ for)				← conjunction
              (NP-SBJ (PRO he)
                      (CP-REL (WNP-1 0)
                              (C that)
                              (IP-SUB (NP-SBJ *T*-1)
                                      (MD shall)	← modal
                                      (VB pulle)	← verb
                                      (NP-OB1 (PRO hit))
                                      (RP oute))))	← particle
              (MD shall)				← modal
              (DO do)					← verb
              (NP-OB1 (PRO hit))
              (PP (P with)
                  (NP (Q litill) (N myght)))
              (. .)
              (' '))
  (ID CMMALORY,46.1512))

( (IP-MAT-SPE (CONJ and)				← conjunction
              (ADVP (ADV ellis))
              (NP-SBJ (PRO I))
              (MD wolde)				← modal
              (HV have)					← auxiliary
              (BEN bene)				← verb
              (ADJP (ADJ lothe)
                    (PP (P as)
                        (NP (Q ony) (N knyght)
                            (CP-REL (WNP-1 0)
                                    (C that)
                                    (IP-SUB (NP-SBJ *T*-1)
                                            (VBP lyvith)))))
                    (IP-INF (FOR for) (TO to)		← auxiliary material
                            (VB sle)
                            (NP-OB1 (D a) (N lady))))
              (. .)
              (' '))
  (ID CMMALORY,51.1701))

Imperatives (IP-IMP)

Imperatives are labelled IP-IMP. Only overt subjects are included in the annotation.
( (IP-IMP-SPE (CONJ but)
              (VBI saye)
              (CP-THT (C 0)
                      (IP-SUB (NP-SBJ (PRO ye))
                              (BEP are)
                              (VAN diseased)))
              (. ,))
  (ID CMMALORY,4.83))

( (IP-IMP (CONJ for)
          (VBI witte)
          (NP-SBJ (PRO ye))		← overt subject
          (ADVP (ADV wele))
          (CP-THT (C +tat)
                  (IP-SUB (NP-SBJ (NPR god))
                          (MD may)
                          (VB se)
                          (NP-OB1 (CONJ ba+te) (N iuil) (CONJ and) (N gude))))
          (. ;))
  (ID CMBENRUL,12.418))

Non-wh CPs

THAT clauses (CP-THT), degree complements (CP-DEG), and certain adverbial clauses (CP-ADV) have the following basic structure:
(CP (C THAT/0)
    (IP ...))

The complementizer position is always included; when not filled by an overt complementizer, it contains 0 (zero).

(NODE (PP (P so)
          (CP-ADV (C that)
                  (IP-SUB (NP-MSR (NP (NUM thre) (NS dayes))
                                  (CONJP (CONJ and)
                                         (NP (NUM thre) (NS nyghtes))))
                          (NP-SBJ (PRO he))
                          (BED was)
                          (ADJP (ADJ specheles)))))
      (ID CMMALORY,6.172))

(NODE (PP (P til)
          (CP-ADV (C that)
                  (IP-SUB (NP-SBJ (PRO ye))
                          (VBP see)
                          (CP-THT (C 0)				← empty complementizer
                                  (IP-SUB (NP-SBJ (PRO ye))
                                          (VBP go)
                                          (PP (P unto)
                                              (NP (D the) (ADJR wers))))))))
      (ID CMMALORY,13.393))

(NODE (PP (P whan)
          (CP-ADV (C 0)
                  (IP-SUB (NP-SBJ (NP (D the) (N duke))
                                  (CONJP (CONJ and)
                                         (NP (PRO$ his) (N wyf))))
                          (BED were)
                          (VBN comyn)
                          (PP (P unto)
                              (NP (D the) (N kynge))))))
      (ID CMMALORY,2.11))

(NODE (IP-SUB (NP-SBJ (PRO we))
              (VBP departe)
              (PP (P from)
                  (ADVP (ADV hens)))
              (ADVP (ADV sodenly))
              (, ,)
              (CP-ADV (C that)
                      (IP-SUB (NP-SBJ (PRO we))
                              (MD maye)
                              (VB ryde)
                              (NP-MSR (Q all) (N nyghte))
                              (PP (P unto)
                                  (NP (PRO$ oure) (ADJ owne) (N castell))))))
      (ID CMMALORY,2.18))

(NODE (IP-SUB (NP-SBJ (PRO he))
              (VBD understood)
              (CP-THT (C that)
                      (IP-SUB (NP-SBJ (NPR syre) (NPR Ector))
                              (BED was)
                              (NEG not)
                              (NP-OB1 (PRO$ his) (N fader)))))
      (ID CMMALORY,9.271))

(NODE (ADVP (ADVR so) (ADV harde)
            (CP-DEG (C that)
                    (IP-SUB (NP-SBJ (N horse) (CONJ and) (N man))
                            (VBD felle)
                            (PP (P to)
                                (NP (D the) (N erthe))))))
      (ID CMMALORY,17.538))

(NODE (ADVP (ADVR so) (ADV merveillously)
            (CP-DEG (C that)
                    (IP-SUB (NP-OB1 (N doubte))
                            (NP-SBJ-1 (PRO it))
                            (BED was)
                            (IP-INF-1 (TO to)
                                      (VB here)
                                      (PP (P of)
                                          (NP (D that) (N bataille))))
                            (PP (P for)
                                (NP (D the) (ADJ grete) (N blood) (N shedynge))))))
      (ID CMMALORY,68.2325))

Wh- CPs

A number of clause types, listed below, contain both a wh- position and a complementizer position. This is to allow for the case in which both positions are filled. Empty wh- positions and empty complementizers are both indicated by 0 (zero). The wh- operator is coindexed to a trace of the same category. See
Wh- traces for details, particularly The position of traces.

Verb fronting to C

Subject-verb inversion in
V1 conditionals, questions, and exclamatives is not explicitly represented as verb movement to C in our annotation scheme. The inverted verb remains a daughter of IP. However, clauses with inversion differ structurally from ones without in not containing a C position.
(PP (P if)
    (CP-ADV (C 0)			← C, no inversion
            (IP-SUB (NP-SBJ (PRO I))
                    (HVD had)
                    (VBN known))))

(CP-ADV (IP-SUB (HVD had)		← inversion, no C
	        (NP-SBJ (PRO I))
                (VBN known)))

(IP-MAT (NP-SBJ (PRO))
        (DOP do)
        (NEG not)
        (VB know)
        (CP-QUE-SUB (WADVP-1 (WADV when))
                    (C 0)			← C, no inversion
                    (IP-SUB (ADVP-TMP *T*-1)
                            (NP-SBJ (PRO they))
                            (MD will)
                            (VB come))))

(CP-QUE-MAT (WADVP-1 (WADV when))		← inversion, no C
            (IP-SUB (ADVP-TMP *T*-1)
	            (MD will)
	            (NP-SBJ (PRO they))
                    (VB come)))

Fronting to pre-complementizer position

Fronting to Spec(CP)

Fronted elements can occupy Spec(CP), the position immediately preceding the complementizer. Since the specifier position is not explicitly indicated in our annotation system for any phrasal category, these elements simply appear within the CP in pre-head position. Such fronted constituents are coindexed with an *ICH* trace or with a resumptive (-RSP) phrase. For analogous cases of fronting in subordinate clauses that are introduced by a subordinating conjunction, see
Fronting to Spec(PP).

Similar cases with an overt complementizer both before and after the fronted element are treated as CP recursion.

( (IP-MAT (NP-SBJ (PRO He))
          (VBD said)
          (CP-THT (ADVP-1 (ADV tomorrow))
                  (C that)
                  (IP-SUB (ADVP-TMP *ICH*-1)
                          (NP-SBJ (PRO he))
                          (MD will)
                          (VB fix)
                          (NP-OB1 (D the) (N problem))))
          (. .)))

( (IP-MAT (NP-SBJ (PRO He))
          (VBD said)
          (CP-THT (NP-LFD (D the) (N problem))
                  (C that)
                  (IP-SUB (NP-SBJ (PRO he))
                          (MD will)
                          (VB fix)
                          (NP-OB1-RSP (PRO it))
                          (ADVP-TMP (ADV tomorrow))))
          (. .)))

(NODE (IP-SUB (NP-SBJ (PRO hie))
              (VBP make+d)
              (CP-THT (NP-LFD (D +danne) (N man)  ← left-dislocated NP in Spec(CP)
                              (CP-REL (WNP-1 0)
                                      (C +de)
                                      (IP-SUB (NP-OB2 *T*-1)
                                              (NP-SBJ (NPR godd))
                                              (NP-OB1 (PRO his))
                                              (VBD to-sant))))
                      (C +tat)
                      (IP-SUB (NP-SBJ-RSP (PRO he))
                              (VBP +turwune+d)
                              (PP (P on)
                                  (NP (PRO$ his) (N godnesse))))))
      (ID CMVICES1,149.1875))

( (IP-MAT-SPE (CONJ &)
              (NP-SBJ *con*)
              (NP-LFD (PRO$ ti) (N wil))
              (VBP iwur+de)
              (NP-OB1-RSP (PRO hit))
              (NP-VOC (ADJ deorwur+de) (NPR lauerd))
              (CP-ADV (CP-ADV (C +tt)
                              (IP-SUB (NP-SBJ (PRO ich))
                                      (PP (P +turh)
                                          (NP (PRO$ +ti) (N streng+de)))
                                      (MD mahe)
                                      (VB stonden)
                                      (PP (P wi+d)
                                          (NP (PRO him)))))
                      (, .)
                      (CONJP (CONJ &)
                             (CP-ADV (NP-LFD (PRO$ his) (ADJ muchele) (N ouergart))  ← left-dislocated NP in Spec(CP)
                                     (C +tt)
                                     (IP-SUB (NP-SBJ (PRO ich))
                                             (NP-OB1-RSP (PRO hit))
                                             (MD mote)
                                             (VB afeallen)))))
              (. .))
  (ID CMMARGA,70.251))

Adjunction to CP

Material appearing before a wh- element must be adjoined to CP rather than occupying Spec(CP) (since the wh- element occupies that position), but our annotation does not explicitly express the distinction between the two types of positions. For the fronting of verbs to the pre-wh position, see
Verb fronting in free relative clauses.
( (CP-QUE-MAT (NP-LFD (D The) (N party))
              (, ,)          
              (WADVP-1 (WADV where))
              (IP-SUB (ADVP-LOC *T*-1)
                      (BEP is)
                      (NP-SBJ-RSP (PRO it)))
              (. ?)))
                      
( (CP-QUE-MAT (NP-VOC (NS Ladies) (CONJ and) (ADJ+NS gentlemen))
              (, ,)          
              (ADVP-TMP (ADV tomorrow))
              (, ,)          
              (WNP-1 (WPRO whom))
              (IP-SUB (NP-OB1 *T*-1)
                      (MD shall)
                      (NP-SBJ (PRO we))
                      (VB elect))
              (. ?)))
                      
(NODE (IP-SUB (NP-SBJ (D tes) (ADJ unseli))
              (NEG ne)
              (MD +turue)
              (NEG nawt)
              (VB seggen)
              (, .)
              (CP-QUE-MAT-SPE (NP-LFD (PRO$ +ti) (NPR lauerd)
                                      (CP-REL (WNP-1 0)
                                              (C +tt)
                                              (IP-SUB (IP-SUB (NP-SBJ (PRO tu))
                                                              (VBP leuest)
                                                              (PP (P on)
                                                                  (NP *T*-1)))
                                                       (, .)
                                                       (CONJP (CONJ &)
                                                              (IP-SUB (NP-SBJ *T*-1)
                                                                      (MD schulde)
                                                                      (NP-OB1 (PRO$ +ti) (N scheld))
                                                                      (BE beon))))))
                              (, .)
                              (WADVP-2 (WADV hwer))
                              (IP-SUB (ADVP-LOC *T*-2)
                                      (BEP is)
                                      (NP-SBJ-RSP (PRO he))
                                      (ADVP-TMP (ADV nu+de)))))
      (ID CMJULIA,122.464))

( (IP-IMP (VBI loke)
          (ADVP-TMP (ADV +tenne))
          (PP (ADV+P her-bi))
          (, .)
          (CP-QUE-SUB (NP-LFD (CP-FRL (WNP-1 (WPRO+ADV hwa-se))
                                      (C 0)
                                      (IP-SUB (NP-SBJ *T*-1)
                                              (PP (P of)
                                                  (NP (PRO$ hire) (N mei+dhad)))
                                              (, ;)
                                              (VBP lihte+d)
                                              (PP (P in-to)
                                                  (NP (N wedlac))))))
                      (, ;)
                      (WPP-2 (P bi)
                             (WNP (WQP (WADV hu) (Q monie))
                                  (NS degrez)))
                      (C 0)
                      (IP-SUB (PP *T*-2)
                              (NP-SBJ-RSP (PRO ha))
                              (VBP falle+d)
                              (ADVP-DIR (RP+WARD dunewardes))))
          (. .))
  (ID CMHALI,144.244))

( (IP-MAT (CONJ and)
          (NP-SBJ *con*)
          (VBD tolde)
          (NP-OB2 (PRO hym))
          (CP-QUE-SUB (PP-1 (P whyle)
                            (CP-ADV (C 0)
                                    (IP-SUB (NP-SBJ (PRO he))
                                            (VBD tarryed)
                                            (ADVP-LOC (ADV there)))))
                      (WADVP-2 (WADV how))
                      (C 0)
                      (IP-SUB (ADVP *T*-2)
                              (PP *ICH*-1)
                              (NP-SBJ (NPR Nero))
                              (BED was)
                              (VAN (VAN destroyed) (CONJ and) (VAN slayne))
                              (PP (P with)
                                  (NP (Q all) (PRO$ his) (N oste)))))
          (. .))
  (ID CMMALORY,57.1905))

( (IP-MAT (CONJ &)
          (NP-SBJ *con*)
          (VBD (VBD +girnde) (CONJ &) (VBD walde))
          (ADVP (ADV +georne))
          (CP-THT (PP-2 (P +gef)
                        (CP-ADV (C 0)
                                (IP-SUB (NP-SBJ (NPR$ godes) (N wil))
                                        (BED were))))
                  (, ;)
                  (C +tt)
                  (IP-SUB (PP *ICH*-2)
                          (NP-SBJ (PRO ha))
                          (MD moste)
                          (BE beon)
                          (NP-OB1 (ONE an)
                                  (PP (P of)
                                      (NP (D +te) (Q moni) (N+NS moder-bern)
                                          (CP-REL (WNP-1 0)
                                                  (C +tt)
                                                  (IP-SUB (NP-SBJ *T*-1)
                                                          (NP-OB1 (QP (ADVR swa) (Q muchel)))
                                                          (VBD drohten)
                                                          (PP (P for)
                                                              (NP (NPR drihtin)))))))))))
  (ID CMMARGA,56.22))

CP recursion

Instances of CP recursion are given the following schematic structure. CP recursion generally occurs with THAT complements, but is attested in indirect questions and other clause types as well.
(CP-THT (C that)
        (CP-THT (TOPIC-1 ...)
                (C that)
                (IP-SUB (TOPIC *ICH*-1)
                        ...)))

The higher complementizer must be overt. Otherwise, the token is treated as fronting to Spec(CP), without recursive structure. As in the simple fronting case, unless the constituent sandwiched between the complementizers is left-dislocated (-LFD) and associated with a resumptive (-RSP) element, it is coindexed with an *ICH* trace.

(NODE (IP-MAT (NP-SBJ (PRO He))
              (VBD said)
              (CP-THT (C that)
                      (CP-THAT (ADVP-1 (ADV tomorrow))
                               (C that)
                               (IP-SUB (ADVP-TMP *ICH*-1)
                                       (NP-SBJ (PRO he))
                                       (MD will)
                                       (VB fix)
                                       (NP-OB1 (D the) (N problem)))))
              (. .)))

( (IP-MAT (NP-SBJ (PRO He))
          (VBD said)
          (CP-THT (C that)
                  (CP-THT (NP-LFD (D the) (N problem))
                          (C that)
                          (IP-SUB (NP-SBJ (PRO he))
                                  (MD will)
                                  (VB fix)
                                  (NP-OB1-RSP (PRO it))
                                  (ADVP-TMP (ADV tomorrow)))))
          (. .)))

( (IP-MAT (CONJ for)
          (NP-SBJ (NPR sain) (NPR paul))
          (VBP sais)
          (CP-THT (C +tat)
                  (CP-THT (NP-LFD (PRO +tai)
                                  (CP-REL (WNP-1 0)
                                          (C +tat)
                                          (IP-SUB (NP-SBJ *T*-1)
                                                  (DOP dos)
                                                  (NP-OB1 (ADJ wicke) (NS dedis)))))
                          (, ,)
                          (C +tat)
                          (IP-SUB (NP-SBJ-RSP (PRO tay))
                                  (VBP giue)
                                  (NP-OB1 (PRO+N +tam-selffe))
                                  (PP (P til)
                                      (NP (D +te) (NPR deuil))))))
          (. ,))
  (ID CMBENRUL,21.735))

(NODE (IP-SUB (NP-SBJ-2 (EX tare))
              (BEP be)
              (NP-2 (Q lytil) (N entirual))
              (, ,)
              (CP-ADV (C +tat)
                      (CP-ADV (NP-LFD (D ta)
                                      (CP-REL (WNP-3 0)
                                              (C +tat)
                                              (IP-SUB (NP-SBJ *T*-3)
                                                      (MD sal)
                                                      (VB ga)
                                                      (PP (P til)
                                                          (NP (NS laburs))))))
                              (, ,)
                              (C +tat)
                              (IP-SUB (NP-SBJ-RSP (PRO tay))
                                      (MD may)
                                      (HV haue)
                                      (NP-OB1 (D +te) (N morning))
                                      (PP (P in)
                                          (NP (D +te) (N begining)
                                              (PP (P of)
                                                  (NP (D +te) (N lyth)))))
                                      (PP (P to)
                                          (NP (PRO$ +tair) (N labur)))))))
      (ID CMBENRUL,15.546))

( (IP-MAT (ADVP-TMP (ADV +Ta))
          (VBD be+tohte)
          (NP-SBJ (PRO he))
          (NP-OB2 (PRO him))
          (CP-THT (C +tet)
                  (CP-THT (PP-1 (P gif)
                                (LB |)
                                (CP-ADV (C 0)
                                        (IP-SUB (NP-SBJ (PRO he))
                                                (MD mihte)
                                                (BE ben)
                                                (ADJP (N+ADJ rotfest))
                                                (PP (P on)
                                                    (NP (NPR Engleland))))))
                          (C +tet)
                          (IP-SUB (PP *ICH*-1)
                                  (NP-SBJ (PRO he))
                                  (MD mihte)
                                  (HV habben)
                                  (NP-OB1 (Q eal) (PRO$ his) (N wille)))))
          (. .))
  (ID CMPETERB,49.224))

(NODE (IP-SUB (NP-SBJ (PRO +du))
              (ADVP (ADV wel))
              (BEP be)
              (VAN iwarned)
              (, ,)
              (CP-THT (PP-2 (P +gif)
                            (CP-ADV (C 0)
                                    (IP-SUB (NP-SBJ (NPR godd))
                                            (NP-OB2 (PRO +de))
                                            (VBP +gif+d)
                                            (NP-OB1 (D +tese) (ADJ swete) (NS teares)))))
                      (, ,)
                      (C +tat)
                      (IP-SUB (PP *ICH*-2)
                              (NP-SBJ (Q non) (N win)
                                      (PP (P in)
                                          (NP (D +dare) (N world))))
                              (NEG+BEP nis)
                              (ADJP (ADVR swa) (ADJ swete)))))
      (ID CMVICES1,149.1856))

(NODE (IP-INF (NP-SBJ (PRO us))
              (VB considere)
              (ALSO also)
              (CP-QUE-SUB (WQ if)
                          (CP-QUE-SUB (NP-LFD (D the) (N conseillung)
                                              (PP (P of)
                                                  (NP (PRO hem)
                                                      (CP-REL (WNP-1 0)
                                                              (C that)
                                                              (IP-SUB (NP-SBJ *T*-1)
                                                                      (VBD conseilleden)
                                                                      (NP-OB2 (PRO yow))
                                                                      (IP-INF (TO to)
                                                                              (VB taken)
                                                                              (NP-OB1 (ADJ sodeyn) (N vengeaunce))))))))
                                      (, ,)
                                      (WQ wheither)
                                      (C 0)
                                      (IP-SUB (NP-SBJ-RSP (PRO it))
                                              (VBP accorde)
                                              (PP (P to)
                                                  (NP (N resoun)))))))
      (ID CMCTMELI,228.C2.447))

Division into sentence tokens

Conjunction

The conventions the division of the text into sentence tokens in connection with conjunction are discussed in more detail in Clausal conjunction. What follows is a list of illustrative examples. Blank lines indicate a break between sentence tokens.

No token break with conjoined simple verbs

The simplest case: conjoined intransitive verbs
He came ,_, saw ,_, and conquered ._.

( (IP-MAT (NP-SBJ (PRO He))
          (VBD (VBD came) (, ,) (VBD saw) (, ,) (CONJ and) (VBD conquered))
          (. .)))
Same treatment for conjoined verbs that share all arguments
He spotted ,_, approached ,_, and greeted the Queen ._.

( (IP-MAT (NP-SBJ (PRO He))
          (VBD (VBD spotted) (, ,) (VBD approached) (, ,) (CONJ and) (VBD greeted))
          (NP-OB1 (D the) (N Queen))
          (. .)))

Ordinary VP conjunction triggers token break

He sang ,_.

and danced the polka ._.

( (IP-MAT (NP-SBJ (PRO He))
          (VBD sang)
          (. ,)))

( (IP-MAT (CONJ and)
          (NP-SBJ *con*)
          (VBD danced)
          (NP-OB1 (D the) (N polka))
          (. .)))

Tricky case: Proper treatment depends on meaning (not on punctuation)

He came ,_, and left fifty years later ._.	← both events in the same year

( (IP-MAT (NP-SBJ (PRO He))
          (VBD (VBD came) (, ,) (CONJ and) (VBD left))
          (ADVP-TMP (NP-MSR (NUM 50) (NS years))
                    (ADVR later))
          (. .)))

He came ,_.

and left fifty years later ._.			← both events separated by 50 years

( (IP-MAT (NP-SBJ (PRO He))
          (VBD came)
          (. ,)))

( (IP-MAT (CONJ and)
          (NP-SBJ *con*)
          (VBD left)
          (ADVP-TMP (NP-MSR (NUM 50) (NS years))
                    (ADVR later))
          (. .)))

No token break with first direct speech clause

He said ,_, " I conquered ._. "_"

( (IP-MAT (NP-SBJ (PRO He))
          (VBD said)
          (, ,)
          (" ")			← default high attachment (contrary to sense)
          (IP-MAT-SPE (NP-SBJ (PRO I))
                      (VBD conquered))
          (" ")			← default high attachment (contrary to sense)
          (. .)))

Further direct speech clauses become separate tokens

He said ,_, " I came ,_.

I saw ,_.

I conquered ._. "_"

( (IP-MAT (NP-SBJ (PRO He))
          (VBD said)
          (, ,)
          (" ")			← default high attachment (contrary to sense)
          (IP-MAT-SPE (NP-SBJ (PRO I))
                      (VBD came))
          (. ,)))

( (IP-MAT-SPE (NP-SBJ (PRO I))
              (VBD saw)
              (. ,)))

( (IP-MAT-SPE (NP-SBJ (PRO I))
              (VBD conquered)
              (. .)
              (" ")))

Clause-adjoined relatives (generally no token break, regardless of capitalization)

She baked brownies ,_, whereupon joy reigned unconfined ._.

She baked brownies ,_, Whereupon joy reigned unconfined ._.

Ordinary combination of direct speech and clause-adjoined relative (no token break)

She said ,_, " I will bake brownies ,_, "_" Whereupon joy reigned unconfined ._.

Separate direct speech token forces clause-adjoined relative to become separate token

She said ,_, " I will fix the errors ._,

and I will do so now ._. "_"

Whereupon joy reigned unconfined ._.

Other cases

Chapter and section headings

Section 102 ,/.

Soldering tips					← separate token

CODE material

Where possible, material tagged with CODE (notably headings, page numbers, and editorial material) counts as a separate token.

Font change indications belong with their associated material; they are never separate tokens.

<text>/CODE

<heading>/CODE

Chapter 22

</heading>/CODE

Here is a sentence.

<P_234>					← separate token

And here is another one.

But lo,
<P_235>					← part of surrounding token
the preceding page number is not a
separate token !

Diaries and letters

In diaries and letters, the place and date of writing each count as separate tokens.

( (NP-LOC (NPR London)))

( (NP-TMP (NPR June) (ADJ 21st) (NUM 1706)))

( (CP-QUE-MAT (NP-VOC (ADJS Dearest) (NPR Delilah))
              (, ,)
              (WADVP-1 (WADV How))
              (IP-SUB (ADVP *T*-1)
	  	      (BEP are)
                      (NP-SBJ (PRO you)))
              (? .)))

Interjections and vocatives

Where possible, interjections and vocatives are treated as part of a preceding or following sentence token.

Plays

The indication of the character speaking lines and the next line spoken by the character count as a single token.

Falstaff ./,			← comma, not period
It is I .