In the general case, sentence tokens are simple (= non-conjoined) instances of one of the following categories:
Sentence tokens are distinct from orthographic sentences (sequences of words delimited by sentence-final punctuation such as period, question mark, or exclamation point). Orthographic sentences can, and often do, consist of several sentence tokens. In the canonical case, sentence tokens are matrix clauses - that is, clauses not in the scope of a (possibly silent) subordinating conjunction. When a matrix clause has a subject that either differs in reference from a previous matrix subject or is overt (whether coreferential with a previous matrix subject or not), a new sentence token is indicated. Conversely, VP conjuncts do not constitute sentence tokens (unless they are instances of FRAG).
( (IP-MAT (NP-SBJ (PRO I)) (VP (VBD left)) (PUNC ,))) ( (IP-MAT (CONJ and) (NP-SBJ (PRO she)) ← switch reference (VP (VBD arrived)) (PUNC ,)))
( (IP-MAT (NP-SBJ (PRO I)) (VP (VBD arrived)) (PUNC ,))) ( (IP-MAT (CONJ and) (NP-SBJ (PRO I)) ← no switch reference, but subject is overt (VP (VBD cleaned) (RP up)) (PUNC ,))) ( (IP-MAT (CONJ and) (ADVP-TMP (ADV then)) (NP-SBJ (PRO I)) ← no switch reference, but subject is overt (VP (VBD left)) (PUNC ,)))
( (IP-MAT (NP-SBJ (PRO I)) (VP (VP (VBD arrived)) ← VP conjuncts (PUNC ,) (CONJP (VP (VBD cleaned) (RP up))) (PUNC ,) (CONJP (CONJ and) (VP (ADVP-TMP (ADV then)) (VP (VBD left))))) (PUNC ,)))
( (CP-QUE-MAT (IP-SUB (DOD Did) (NP-SBJ (PRO (PRO you)) (VP (VB come)))) (PUNC ?))) ← see Question mark ( (CP-QUE-MAT (CONJ Or) (IP-SUB (DOD did) (NP-SBJ (PRO (PRO you)) ← no switch reference, but overt (VP (VB go)))) (PUNC ?)))
( (CP-QUE-MAT (IP-SUB (DOD Did) (NP-SBJ (PRO (PRO you)) (VP (VP (VB come)) ← VP conjuncts (CONJP (CONJ or) (VP (VB go)))))) (PUNC ?)))
( (CP-QUE-MAT (WNP-1 (WPRO What)) (IP-SUB (DOD did) (NP-SBJ (PRO (PRO you)) (VP (DO do) (NP-OB1 *T*-1)))) (PUNC ?))) ( (FRAG (VP (VBD planted) (NP-OB1 (NS potatoes))) (PUNC .)))
However, in the absence of a close connection between these items and an adjacent clause, these items stand alone. Notably, YES, NO, and the like may form complete answers to questions, and OKAY, UH-HUH, and the like may function purely as turn-taking signals. In such cases, the interjection and the subsequent material are annotated as separate tokens. The same goes for UH-HUH and the like, when they are used as turn-taking signals and not closely connected to following material.
============================================================ A: Are you coming? B: Yes, I am. ← one token ============================================================ A: Are you coming? B: Yes. ← separate tokens B: Why do you ask? ============================================================ A: Are you going to the party? B: Yes. ← separate tokens B: Are you? ============================================================ A: So I ended up leaving early. B: Uh-huh, okay. ← separate tokens B: And then what happened? ============================================================
( (IP-MAT (NP-SBJ (PRO He)) (VP (VBD said) (, ,) (QTP (IP-MAT (NP-SBJ (PRO I)) (VP (VBD came))))) (PUNC ,))) ( (QTP (IP-MAT (NP-SBJ (PRO I)) (VP (VBD saw))) (PUNC ,))) ( (QTP (IP-MAT (NP-SBJ (PRO I)) (VP (VBD conquered))) (PUNC .)))
( (IP-MAT (NP-SBJ (D The) (N river)) (VP (VBD froze)) (PUNC ,) (PAREN (IP-MAT (NP-SBJ (PRO it)) (VP (BED was) (ADJP-PRD (ADVP (ADVR so)) (ADJ cold))))) (PUNC .)))
( (IP-MAT (NP-SBJ (PRO They)) (VP (VBD met) (NP-OB1 (D a) (N guy) (PUNC ,) (IP-MAT-REL (NP-SBJ (PRO he)) (VP (BED was) (ADJP-PRD (ADJ able) (IP-INF (TO to) (VP (VB show) (NP-OB2 (PRO them))))))))) (PUNC .))) ( (IP-MAT (NP-SBJ (PRO They)) (VP (VBD met) (NP-OB1 (D a) (N guy) (PUNC ,) (IP-MAT-REL (NP-SBJ (PRO I)) (VP (MD ca@) (NEG @n't) (VP (VB remember) (NP-OB1 (NP-POS (PRO$ his)) (N name)) (PP (P at) (NP (D the) (N moment)))))))) (PUNC .)))
We adopt the IP-MAT-REL label over other annotation alternatives (see below) because we wish to make the retrieval of these cases as convenient as possible. In particular, we wish to facilitate cross-linguistic study of the phenomenon, which has been noted in German, where it is very striking given that language's V2 character.
(1) | a. | Es | war | einmal | ein | König, | der | sieben | Kinder | hatte. | |
expl-it | was | once | a | king | who/this-one | seven | children | had | |||
Ordinary relative: 'Once, there was a king who had seven children.' | |||||||||||
b. | Es war einmal ein König, der hatte sieben Kinder. | ||||||||||
Integrated V2 relative: 'Once, there was a king; he had seven children.' |
The two annotation alternatives that we do not adopt are as follows. First, we could simply enclose the second clause in PAREN brackets. But this would not allow a targeted retrieval of the cases of interest, as parenthetical clauses can attach as sisters of N without being adnominal modifiers.
( (IP-MAT (NP-SBJ (PRO They)) (VP (VBD met) (NP-OB1 (D a) (N guy) (PUNC ,) (PAREN (IP-MAT (NP-SBJ (PRO it)) (VP (BED was) (NP-PRD (ADJP (ADJ sheer)) (N coincidence))))) (PUNC ,) (PP (P with) (NP (D the) (ADJP (ADJ right)) (N kind) (PP (P of) (NP (N equipment))))))) (PUNC .)))
Second, we could annotate IP-MAT-RELs
as zero-marked
relative clauses containing
a resumptive pronoun.
Though this would allow targeted retrieval, the IP-MAT-REL label is more
convenient.
Pragmatically analogous cases where the first conjunct belongs to
some category other than IP-IMP are treated the same way.
In order to facilitate retrievability and comparison, "reverse" reprise constructions are also treated
as single tokens.
Parenthetical question
( (IP-MAT (NP-SBJ (PRO They))
(VP (VBP like)
(NP-OB1 (PRO that)))
(PUNC ,)
(PAREN (CP-QUE-MAT (IP-SUB (DOP do)
(NP-SBJ (PRO they)))))
(PUNC ?)))
( (IP-MAT (NP-SBJ (PRO They))
(VP (MD would)
(VP (VB like)
(NP-OB1 (PRO that))))
(PUNC ,)
(PAREN (CP-QUE-MAT (IP-SUB (DOP do@)
(NEG @n't)
(NP-SBJ (PRO you))
(VP (VB think)))))
(PUNC ?)))
Pseudo-imperative and similar cases
In the so-called pseudo-imperative, an imperative followed by a matrix
declarative corresponds pragmatically to the protasis (IF clause) and
apodosis (THEN clause) of a conditional construction. In order to
represent the tight link between the two clauses and to allow these
cases to be retrieved using CorpusSearch, pseudo-imperatives are
exceptionally annotated as single sentence tokens, even though they
consist of two independent conjoined clauses. In doubtful cases, the
default is to annotate as a single token, since otherwise the examples
would not be retrievable with CorpusSearch.
( (IP-IMP (IP-IMP (VP (VBI Cross)
(NP-OB1 (D the)
(N line))))
(PUNC ,)
(CONJP (CONJ and)
(IP-MAT (NP-SBJ (PRO I))
(VP (VBP shoot))))
(PUNC .)))
( (IP-IMP (IP-IMP (VP (VBI Stay)
(PP (P behind)
(NP (D the)
(N line)))))
(PUNC ,)
(CONJP (CONJ or)
(IP-MAT (NP-SBJ (PRO I))
(VP (VBP shoot))))
(PUNC .)))
( (FRAG (NP (NP (NUMP (NUM One))
(QP (QR more))
(N step))
(PUNC ,)
(CONJP (CONJ and)
(IP-MAT (NP-SBJ (PRO I))
(VP (VBP shoot)))))
(PUNC .)))
( (FRAG (FRAG (NP (NUMP (NUM One))
(N step))
(ADVP (ADVR closer)))
(PUNC ,)
(CONJP (CONJ and)
(IP-MAT (NP-SBJ (PRO I))
(VP (VBP shoot))))
(PUNC .)))
Reprise construction
( (IP-MAT (NP-SBJ (D The)
(NS kids))
(VP (VBP like)
(NP-OB1 (PRO that)))
(PUNC ,)
(PAREN (IP-MAT (NP-SBJ (PRO they))
(VP (DOP do))))
(PUNC .)))
( (IP-MAT (NP-SBJ (PRO They))
(VP (VBP like)
(NP-OB1 (PRO that)))
(PUNC ,)
(PAREN (IP-MAT (NP-SBJ (D the)
(NS kids))
(VP (DOP do))))
(PUNC .)))
( (IP-MAT (NP-SBJ (D The)
(NS kids))
(VP (DOP do))
(PUNC ,)
(PAREN (IP-MAT (NP-SBJ (PRO they))
(VP (VBP like)
(NP-OB1 (PRO that)))))
(PUNC .)))
( (IP-MAT (NP-SBJ (PRO They))
(VP (DOP do))
(PUNC ,)
(PAREN (IP-MAT (NP-SBJ (D they)
(NS kids))
(VP (VBP like)
(NP-OB1 (PRO that)))))
(PUNC .)))
Tag question
( (IP-MAT (NP-SBJ (PRO They))
(VP (VBP like)
(NP-OB1 (PRO that)))
(PUNC ,)
(PAREN (CP-QUE-TAG (IP-SUB (DOP do@)
(NEG @n't)
(NP-SBJ (PRO they)))))
(PUNC ?)))
( (IP-MAT (NP-SBJ (PRO They))
(VP (MD would)
(VP (VB like)
(NP-OB1 (PRO that))))
(PUNC ,)
(PAREN (CP-QUE-TAG (IP-SUB (MD would@)
(NEG @n't)
(NP-SBJ (PRO they)))))
(PUNC ?)))