Coding conventions for the coded parsed corpora

This page describes the scheme according to which I have coded the Penn Parsed Corpus of Early Modern English (PPCEME) and the Parsed Corpus of Early English Correspondence (PCEEC). The following list shows the 15 variables included in the coding scheme; the values for the variables are listed and explained in more detail below. Columns 1-7 encode syntactic (internal) variables; columns 8-15 encode various external properties of the texts, including sociolinguistic ones.
  1. Presence or absence of do support in negative declaratives
  2. Presence or absence of do support in questions
  3. Positive or negative question?
  4. Type of question
  5. Order of finite head and NOT in negative declaratives
  6. Order of finite head, NOT, and subject in negative questions
  7. Order of NEVER w.r.t. finite head
  8. Date of composition - century
  9. Date of composition - decade
  10. Date of composition - last digit
  11. Author's sex
  12. Author's age by decade
  13. Text genre
  14. Lexical complexity of text
  15. Overlap with PCEEC?

The "doesn't apply" value is coded as "-". For instance, a negative declarative would have "-" in columns 2-4. Or a text whose author's birthdate is unknown would have "-" in column 12.

Columns 1-4 refer to Ellegård's know class. This is a verb class that includes care, doubt, know, list 'like', mistake, trow 'believe', and wit 'know' (Ellegård 1953). Sentences containing these verbs tend to exhibit lower rates of do support than sentences containing ordinary verbs.


Column 1: Presence or absence of do support in negative declaratives (back to top)

Uppercase letters indicate the old grammar; lowercase letters indicate the new grammar. Boldface indicates the main verb in the examples.

Symbol Explanation Examples
B clause contains finite main verb be and has simple negation The king is not popular.
D clause contains finite main verb do and has simple negation The king did not the dishes.
H clause contains finite main verb have and has simple negation The king had not his bodyguard with him.
K clause contains a finite member of Ellegård's know class and has simple negation The king knew not the answer.
V clause contains some other finite main verb and has simple negation The king listened not to his subjects.
b clause contains finite main verb be and has negation with do support (should not occur) The king does not be popular.
d clause contains finite main verb do and has negation with do support The king did not do the dishes.
h clause contains finite main verb have and has negation with do support The king did not have his bodyguard with him.
k clause contains a finite member of Ellegård's know class and has negation with do support The king did not know the answer.
v clause contains some other finite main verb and has negation with do support The king did not listen to his subjects.

Column 2: Presence or absence of do support in questions (back to top)

Analogous to column 1.

Symbol Explanation Examples
B question contains finite main verb be and has simple inversion Is the king popular?
D question contains finite main verb do and has simple inversion Did the king the dishes?
H question contains finite main verb have and has simple inversion Had the king his bodyguard with him?
K question contains a finite member of Ellegård's know class and has simple inversion Wott the king the answer?
V question contains some other finite main verb and has simple inversion Listened the king to his subjects?
b question contains finite main verb be and has negation with do support (should not occur) Does the king be popular?
d question contains finite main verb do and has do support Did the king do the dishes?
h question contains finite main verb have and do support Did the king have his bodyguard with him?
k question contains a finite member of Ellegård's know class and has do support Did the king know the answer?
v question contains some other finite main verb and do support Did the king listen to his subjects?

Column 3: Positive or negative question (back to top)

This column is independent of whether the question instantiates the old or the new grammar.

Symbol Explanation Examples
n negative Listened the king not to his subjects?
Did the king not listen to his subjects?
Did not the king listen to his subjects?
Didn't the king listen to his subjects?
p positive Listened the king to his subjects?
Did the king listen to his subjects?

Column 4: Type of question (back to top)

This column is independent of whether the question instantiates the old or the new grammar.

Symbol Explanation Examples
y yes-no (polar) question Listened the king (not) to his subjects?
Did the king (not) listen to his subjects?
c complement (direct or indirect object) What said the king to his subjects?
What did the king say to his subjects?
a adjunct (other) Why started the king that ill-fated war?
Why did the king start that ill-fated war?

Column 5: Order of finite head and NOT in negative declaratives (back to top)

Uppercase letters indicate immediate precedence (>>); lowercase letters indicate simple precedence (>) (immediate or not).

Symbol Explanation Examples
B auxiliary or main verb be >> negation The king is not listening; the king is not a genius.
The king isn't coming; the king isn't a genius.
D auxiliary do >> negation The king does not listen.
The king doesn't listen.
H auxiliary or main verb have >> negation The king has not listened; the king has not a clue.
The king hasn't been listening; the king hasn't a clue.
M modal >> negation The king will not listen.
The king shouldn't have come.
V ordinary verb >> negation The king came not last year.
b auxiliary or main verb be > negation The king is now not listening; the king is now not at the palace.
d auxiliary do > negation The king doth lately not listen.
h auxiliary or main verb have > negation The king has lately not been listening; the king has now not a clue.
m modal > negation The king will now not listen.
v ordinary verb > negation The king came last year not.

Column 6: Order of finite head, NOT, and subject in negative questions (back to top)

Uppercase letters indicate immediate precedence (>>); lowercase letters indicate simple precedence (>) (immediate or not).

Symbol Explanation Examples
B auxiliary or main verb be >> negation > pronominal subject Is not (...) he listening? Is not (...) he a genius?
Isn't (...) he listening? Isn't (...) he a genius?
C same as B but with full noun phrase subject Is not (...) the king listening? Is not (...) the king a genius?
Isn't (...) the king listening? Isn't (...) the king a genius?
D auxiliary do >> negation > pronominal subject Doth not (...) he listen?
Doesn't (...) he listen?
E same as D but with full noun phrase subject Doth not (...) the king listen?
H auxiliary or main verb have >> negation > pronominal subject Has not (...) he listened? Has not (...) he a clue?
Hasn't (...) he been listening? Hasn't (...) he a clue?
I same as H but with full noun phrase subject Has not (...) the king listened? Has not (...) the king a clue?
Hasn't (...) the king been listening? Hasn't (..) the king a clue?
M modal >> negation > pronominal subject Will not (...) he not listen?
Shouldn't (...) he have come?
N same as M but with full noun phrase subject Will (...) not (...) the king listen?
Shouldn't (...) the king have come?
V ordinary verb >> negation > pronominal subject Listens not (...) he to us?
W same as V but with full noun phrase subject Listens not (...) the king to us?
b auxiliary or main verb be > pronominal subject > negation Is (...) he (...) not listening? Is (...) he (...) not a genius?
c same as b but with full noun phrase subject Is (...) the king (...) not listening? Is (...) the king (...) not a genius?
d auxiliary do > pronominal subject > negation Doth (...) he (...) not listen?
e same as d but with full noun phrase subject Doth (...) the king (...) not listen?
h auxiliary or main verb have > pronominal subject > negation Has (...) he (...) not listened? Has (...) he (...) not a clue?
i same as h but with full noun phrase subject Has (...) the king (...) not been listening? Has (...) the king (...) not a clue?
m modal > pronominal subject > negation Will (...) he (...) not listen?
n same as m but with full noun phrase subject Should (...) the king (...) not listen?
v ordinary verb > pronominal subject > negation Listens (...) he (...) not to us?
w same as v but with full noun phrase subject Listens (...) the king (...) not to us?

Column 7: Order of NEVER w.r.t. finite head (back to top)

Uppercase letters indicate finite head preceding NEVER; lowercase letters indicate NEVER preceding finite head. This category is included for purposes of comparing the development of NOT sentences with NEVER sentences.

Symbol Explanation Examples
B (auxiliary or main verb) be > NEVER He is never leaving.
D auxiliary do > NEVER He does never leave.
E main verb do > NEVER He does never the dishes.
H (auxiliary or main verb) have > NEVER He has never left.
M modal > NEVER He will never leave.
V other verb > NEVER He left never.
b NEVER > (auxiliary or main verb) be He never is leaving.
d NEVER > auxiliary do He never does leave.
e NEVER > main verb do He never does the dishes.
h NEVER > (auxiliary or main verb) have He never has left.
m NEVER > modal He never will leave.
v NEVER > other verb He never left.
x NEVER > nonfinite verb form (should not occur) They will leave never.

Column 8: Date of composition - century (back to top)

Symbol Explanation
5 text is from the 1500's
6 text is from the 1600's
7 text is from the 1700's

Column 9: Date of composition - decade (back to top)

Symbol Explanation
0 text is from the "oughts" of a particular century (1500-1509, 1600-1609, 1700-1709)
1 text is from the teens of a particular century (1510-1519, 1610-1619, 1710-1719)
2-9 and so on

Column 10: Date of composition - last digit (back to top)

Symbol Explanation
0 year for text ends in 0 (1500, 1510, 1520, ... 1600, 1610, 1620, ... 1700, 1710)
1-9 and so on

Column 11: Author's sex (back to top)

Treat with caution in comedy and fiction.

Symbol Explanation
f female
m male

Column 12: Author's age by decade (back to top)

Symbol Explanation
0 Author's age at date of composition between 0 and 9
1 Author's age at date of composition between 10 and 19
2-8 and so on

Column 13: Genre (back to top)

Uppercase letters indicate genres that are probably formal; lowercase letters one that are probably informal. See notes for unclear cases.

Symbol Explanation Notes
A autobiography
B biography
C science
D diary, private Not clear whether formal or informal.
E educational treatise
F fiction
H handbook
I history
L letter, nonprivate
M medicine
P philosophy The two texts in this category (boethco, boethpr) are translations from the Latin. Treat with caution, as they might be unduly influenced by the Latin original.
R sermon
T travelogue
W Tyndale bible Reflects the usage of its time.
X King James bible Closely follows the Tyndale bible; archaic usage which does not follow the usage of its time. Best omitted.
Y statute Increasingly formulaic and archaic usage. Best omitted.
Z other Elizabeth's translation of Boethius, which tends to be gibberish. Best omitted.
c comedy
l letter, private Not clear whether formal or informal.
t trials Transcriptions of trial proceedings, therefore possibly closer to vernacular usage than other written sources.

Column 14: Lexical complexity of text (back to top)

The measure of lexical complexity is the one discussed in Solution 4 (the average of rank w.r.t. average word length and rank w.r.t. type/token ratio, normalized by number of texts). This property is not available for the PCEEC.

Symbol Explanation
h lexical complexity >= median
l lexical complexity < median

Column 15: Overlap with PCEEC? (back to top)

Some letters are included in both the PPCEME and the PCEEC. When using both corpora, these letters should obviously not be counted twice. The relevant texts are coded as "y" in the coded version of the PPCEME, and as "-" in the PCEEC.

Symbol Explanation
y text included in both corpora

References

Ellegård, Alvar. 1953.
The auxiliary do. The establishment and regulation of its use in English. Stockholm: Almqvist & Wiksell.
Kroch, Anthony. 1989.
Reflexes of grammar in patterns of language change. Language variation and change 1:199-244.
Warner, Anthony. 2005.
Why DO dove: Evidence for register variation in Early Modern English negatives. Language variation and change 17:257-280.