Coding conventions for the coded parsed corpora

This page describes the scheme according to which I have coded the Penn Parsed Corpus of Early Modern English (PPCEME) and the Parsed Corpus of Early English Correspondence (PCEEC). The following list shows the 15 variables included in the coding scheme; the values for the variables are listed and explained in more detail below. Columns 1-7 encode syntactic (internal) variables; columns 8-15 encode various external properties of the texts, including sociolinguistic ones.
  1. Presence or absence of do support in negative declaratives
  2. Presence or absence of do support in questions
  3. Positive or negative question?
  4. Type of question
  5. Order of finite head and NOT in negative declaratives
  6. Order of finite head, NOT, and subject in negative questions
  7. Order of NEVER w.r.t. finite head
  8. Date of composition - century
  9. Date of composition - decade
  10. Date of composition - last digit
  11. Author's sex
  12. Author's age, grouped by decade
  13. Text genre
  14. Complexity of text
  15. Overlap with PCEEC?

The "doesn't apply" value is coded as "-". For instance, a negative declarative would have "-" in columns 2-4. Or a text whose author's birthdate is unknown would have "-" in column 12.

Columns 1-4 make reference to Ellegård's know class. This is a verb class that includes care, doubt, know, list 'like', mistake, trow 'believe', and wit 'know' (Ellegård 1953). Sentences containing these verbs tend to exhibit lower rates of do support than sentences containing ordinary verbs.


Column 1: Presence or absence of do support in negative declaratives (back to top)

Uppercase letters indicate the old grammar; lowercase letters indicate the new grammar. Boldface indicates the main verb in the examples.

Symbol Explanation Examples
B clause contains finite main verb be and has simple negation The king is not popular.
D clause contains finite main verb do and has simple negation The king did not the dishes.
H clause contains finite main verb have and has simple negation The king had not his bodyguard with him.
K clause contains a finite member of Ellegård's know class and has simple negation The king knew not the answer.
V clause contains some other finite main verb and has simple negation The king listened not to his subjects.
b clause contains finite main verb be and has negation with do support (should not occur) The king does not be popular.
d clause contains finite main verb do and has negation with do support The king did not do the dishes.
h clause contains finite main verb have and has negation with do support The king did not have his bodyguard with him.
k clause contains a finite member of Ellegård's know class and has negation with do support The king did not know the answer.
v clause contains some other finite main verb and has negation with do support The king did not listen to his subjects.

Column 2: Presence or absence of do support in questions (back to top)

Analogous to column 1.

Symbol Explanation Examples
B question contains finite main verb be and has simple inversion Is the king popular?
D question contains finite main verb do and has simple inversion Did the king the dishes?
H question contains finite main verb have and has simple inversion Had the king his bodyguard with him?
K question contains a finite member of Ellegård's know class and has simple inversion Wott the king the answer?
V question contains some other finite main verb and has simple inversion Listened the king to his subjects?
b question contains finite main verb be and has negation with do support (should not occur) Does the king be popular?
d question contains finite main verb do and has do support Did the king do the dishes?
h question contains finite main verb have and do support Did the king have his bodyguard with him?
k question contains a finite member of Ellegård's know class and has do support Did the king know the answer?
v question contains some other finite main verb and do support Did the king listen to his subjects?

Column 3: Positive or negative question (back to top)

This column is independent of whether the question instantiates the old or the new grammar.

Symbol Explanation Examples
n negative Listened the king not to his subjects?
Did the king not listen to his subjects?
Did not the king listen to his subjects?
Didn't the king listen to his subjects?
p positive Listened the king to his subjects?
Did the king listen to his subjects?

Column 4: Type of question (back to top)

This column indicates whether a question is a wh- question or a yes-no question. In addition, it distinguishes among wh- constituents that are objects, have an adverbial function, or have some other function (say, predicate adjectives). Finally, the column encodes the transitivity of the verb. This column is independent of whether the question instantiates the old or the new grammar. It is included to enable you to replicate the results of Ellegârd 1953, reported in Kroch 1989:22-23.

Symbol Explanation Examples
O object (direct or indirect) What ate the king? What asked the king the queen?
What did the king eat? What did the king ask the queen?
A adverbial, transitive Why/When/Where started the king the war?
Why/When/Where did the king start the war?
a adverbial, intransitive Why/When/Where laughed the king?
Why/When/Where did the king laugh?
Y yes-no question, transitive Started the king the war?
Did the king start the war?
y yes-no question, intransitive Laughed the king?
Did the king laugh?

Column 5: Order of finite head and NOT in negative declaratives (back to top)

Uppercase letters indicate immediate precedence (>>); lowercase letters indicate non-immediate precedence (>).

Symbol Explanation Examples
B auxiliary or main verb be >> negation The king is not listening; the king is not a genius.
The king isn't coming; the king isn't a genius.
D auxiliary do >> negation The king does not listen.
The king doesn't listen.
H auxiliary or main verb have >> negation The king has not listened; the king has not a clue.
The king hasn't been listening; the king hasn't a clue.
M modal >> negation The king will not listen.
The king shouldn't have come.
V ordinary verb >> negation The king came not last year.
b auxiliary or main verb be > negation The king is now not listening; the king is now not at the palace.
d auxiliary do > negation The king doth lately not listen.
h auxiliary or main verb have > negation The king has lately not been listening; the king has now not a clue.
m modal > negation The king will now not listen.
v ordinary verb > negation The king came last year not.

Column 6: Order of finite head, NOT, and subject in negative questions (back to top)

Uppercase letters indicate immediate precedence (>>); lowercase letters indicate non-immediate precedence (>).

Symbol Explanation Examples
B auxiliary or main verb be >> negation > pronominal subject Is not (...) he listening? Is not (...) he a genius?
Isn't (...) he listening? Isn't (...) he a genius?
C same as B but with full noun phrase subject Is not (...) the king listening? Is not (...) the king a genius?
Isn't (...) the king listening? Isn't (...) the king a genius?
D auxiliary do >> negation > pronominal subject Doth not (...) he listen?
Doesn't (...) he listen?
E same as D but with full noun phrase subject Doth not (...) the king listen?
H auxiliary or main verb have >> negation > pronominal subject Has not (...) he listened? Has not (...) he a clue?
Hasn't (...) he been listening? Hasn't (...) he a clue?
I same as H but with full noun phrase subject Has not (...) the king listened? Has not (...) the king a clue?
Hasn't (...) the king been listening? Hasn't (..) the king a clue?
M modal >> negation > pronominal subject Will not (...) he listen?
Shouldn't (...) he have come?
N same as M but with full noun phrase subject Will (...) not (...) the king listen?
Shouldn't (...) the king have come?
V ordinary verb >> negation > pronominal subject Listens not (...) he to us?
W same as V but with full noun phrase subject Listens not (...) the king to us?
b auxiliary or main verb be > pronominal subject > negation Is (...) he (...) not listening? Is (...) he (...) not a genius?
c same as b but with full noun phrase subject Is (...) the king (...) not listening? Is (...) the king (...) not a genius?
d auxiliary do > pronominal subject > negation Doth (...) he (...) not listen?
e same as d but with full noun phrase subject Doth (...) the king (...) not listen?
h auxiliary or main verb have > pronominal subject > negation Has (...) he (...) not listened? Has (...) he (...) not a clue?
i same as h but with full noun phrase subject Has (...) the king (...) not been listening? Has (...) the king (...) not a clue?
m modal > pronominal subject > negation Will (...) he (...) not listen?
n same as m but with full noun phrase subject Should (...) the king (...) not listen?
v ordinary verb > pronominal subject > negation Listens (...) he (...) not to us?
w same as v but with full noun phrase subject Listens (...) the king (...) not to us?

Column 7: Order of NEVER w.r.t. finite head (back to top)

Uppercase letters indicate finite head preceding NEVER; lowercase letters indicate NEVER preceding finite head. This category is included for purposes of comparing the development of NOT sentences with NEVER sentences.

Symbol Explanation Examples
B (auxiliary or main verb) be > NEVER He is never leaving.
D auxiliary do > NEVER He does never leave.
E main verb do > NEVER He does never the dishes.
H (auxiliary or main verb) have > NEVER He has never left.
K know class verb > NEVER He cared never.
M modal > NEVER He will never leave.
V ordinary verb > NEVER He left never.
b NEVER > (auxiliary or main verb) be He never is leaving.
d NEVER > auxiliary do He never does leave.
e NEVER > main verb do He never does the dishes.
h NEVER > (auxiliary or main verb) have He never has left.
k NEVER > know class verb He never cared.
m NEVER > modal He never will leave.
v NEVER > ordinary verb He never left.
x NEVER > nonfinite verb form (should not occur) They will leave never.

Column 8: Date of composition - century (back to top)

Symbol Explanation
5 text is from the 1500's
6 text is from the 1600's
7 text is from the 1700's

Column 9: Date of composition - decade (back to top)

Symbol Explanation
0 text is from the "oughts" of a particular century (1500-1509, 1600-1609, 1700-1709)
1 text is from the teens of a particular century (1510-1519, 1610-1619, 1710-1719)
2-9 and so on

Column 10: Date of composition - last digit (back to top)

Symbol Explanation
0 year for text ends in 0 (1500, 1510, 1520, ... 1600, 1610, 1620, ... 1700, 1710)
1-9 and so on

Column 11: Author's sex (back to top)

Treat with caution in comedy and fiction.

Symbol Explanation
f female
m male

Column 12: Author's age, grouped by decade (back to top)

Symbol Explanation
0 Author's age at date of composition between 0 and 9
1 Author's age at date of composition between 10 and 19
2-8 and so on

Column 13: Genre (back to top)

Uppercase letters indicate genres that are probably formal; lowercase letters ones that are probably informal. See notes for unclear cases. This category was introduced as a proxy for closeness to/distance from the vernacular. We retain it even though it is not a very good proxy.

Symbol Explanation Notes
A autobiography
B biography
C science
D diary, private Not clear whether formal or informal.
E educational treatise
F fiction
H handbook
I history
L letter, nonprivate
M medicine
P philosophy The two texts in this category (boethco, boethpr) are translations from the Latin. Treat with caution, as they might be unduly influenced by the Latin original.
R sermon
T travelogue
W Tyndale bible Representative of the usage of its time.
X statute Treat with caution, as the language might become increasingly archaic. Best omitted.
Y King James bible Closely follows the Tyndale bible, and hence not representative of the usage of its time. Best omitted.
Z Elizabeth's translation of Boethius Tends to be gibberish. Best omitted.
c comedy
l letter, private Not clear whether formal or informal.
t trials Transcriptions of trial proceedings, therefore possibly closer to vernacular usage than other written sources.

Column 14: Complexity of text (back to top)

This category is also intended as a proxy for closeness to/distance from the vernacular, the idea being that the higher a text's complexity, the further removed it is from the vernacular. Complexity is measured as follows. I ranked all texts by mean word length (characters per word), mean sentence length (words per sentence token), and type/token ratio (adjusted for text length) and then calculated the correlation between each pair of measures (using the conventional way of doing this, with a measure called Spearman's rank correlation coefficient). I eliminated type/token ratio from further consideration because the correlation between it and the other two measures was essentially zero. On the other hand, the correlation between word and sentence length was good (r = 0.75). I calculated a combined rank for each text that was simply the mean of the text's ranks for word length and sentence length. Based on the resulting combined ranks, I distinguished between texts of "high" and "low" complexity. In case you're interested, the calculations are in textComplexity.xls.

Symbol Explanation
h high complexity
l low complexity

Column 15: Overlap with PCEEC? (back to top)

Some letters are included in both the PPCEME and the PCEEC. When using both corpora, these letters should obviously not be counted twice. The relevant texts are coded as "y" in the coded version of the PPCEME, and as "-" in the PCEEC.

Symbol Explanation
y text included in both corpora

References

Ellegård, Alvar. 1953.
The auxiliary do. The establishment and regulation of its use in English. Stockholm: Almqvist & Wiksell.
Kroch, Anthony. 1989.
Reflexes of grammar in patterns of language change. Language variation and change 1:199-244.
Warner, Anthony. 2005.
Why DO dove: Evidence for register variation in Early Modern English negatives. Language variation and change 17:3, 257-280.