This assignment is based on a spreadsheet (download here) containing do support data from the Penn Parsed Corpus of Early Modern English.
The spreadsheet contains two sheets. You will be working with the first sheet; the second sheet contains data in an even rawer form than the first sheet. In particular, a particular text can be represented in the corpus by 1-3 samples. Sheet 2 gives information by sample, whereas Sheet 1 combines the information for each text. You don't need to refer to Sheet 2, but don't delete it; otherwise, the cells in Sheet 1 will all turn to 0. (If you want, you can try it out by deleting the contents of cell B2 in Sheet 2 (whose value is 1). See how the corresponding call in Sheet 1 turns to 0?)
Given the size of the spreadsheet, it is likely to contain a few errors. Please devise some "sanity checks," and report a list of any errors that you find with a brief explanation to Caitlin Light.
|If you sort the data, be sure to select the entire sheet (in Excel, click on <> in the top lefthand corner). Otherwise, if you sort a single column in isolation from the others, the spreadsheet will turn into gibberish.|
The explanations for the columns in the spreadsheet are as follows. "Old" and "new" refer to the V-I raising and the I-V lowering grammar, respectively.
|Filename||identifies a particular text|
|NegOld||raw number of instances of old negative sentences (He sleeps not)|
|NegNew||raw number of instances of new negative sentences (He does not sleep)|
|QOld||raw number of instances of old questions (Sleeps he?)|
|QNew||raw number of instances of new questions (Does he sleep?)|
|CharsRaw||raw number of characters in text (excluding punctuation)|
|WordTokensRaw||raw number of word tokens in text (two instances of the same word are counted twice)|
|WordTypesRaw||raw number of word types in text (two instances of the same word are counted once)|
|Sentences||number of sentences in text|
|CharsOC||like CharsRaw, but based only on open-class items (noun, verbs, and adjectives)|
|WordTokensOC||like WordTokensRaw, but based only on open-class items|
|WordTypesOC||like WordTypesRaw, but based only on open-class items|
|DateOfBirth||date of author's birth (- if unknown)|
|DateOfComp||date of composition of text (- if unknown)|
|Sex||author's sex (m by default)|