Linguistics 300, F08, Assignment 3

Due: F 9/12

This assignment is based on a spreadsheet (download here) containing do support data from the Penn Parsed Corpus of Early Modern English.

The spreadsheet contains two sheets. You will be working with the first sheet; the second sheet contains data in an even rawer form than the first sheet. In particular, a particular text can be represented in the corpus by 1-3 samples. Sheet 2 gives information by sample, whereas Sheet 1 combines the information for each text. You don't need to refer to Sheet 2, but don't delete it; otherwise, the cells in Sheet 1 will all turn to 0. (If you want, you can try it out by deleting the contents of cell B2 in Sheet 2 (whose value is 1). See how the corresponding call in Sheet 1 turns to 0?)

Given the size of the spreadsheet, it is likely to contain a few errors. Please devise some "sanity checks," and report a list of any errors that you find with a brief explanation to Caitlin Light.

If you sort the data, be sure to select the entire sheet (in Excel, click on <> in the top lefthand corner). Otherwise, if you sort a single column in isolation from the others, the spreadsheet will turn into gibberish.

The explanations for the columns in the spreadsheet are as follows. "Old" and "new" refer to the V-I raising and the I-V lowering grammar, respectively.

col A Filename identifies a particular text
col B NegOld raw number of instances of old negative sentences (He sleeps not)
col C NegNew raw number of instances of new negative sentences (He does not sleep)
col D QOld raw number of instances of old questions (Sleeps he?)
col E QNew raw number of instances of new questions (Does he sleep?)
col F CharsRaw raw number of characters in text (excluding punctuation)
col G WordTokensRaw raw number of word tokens in text (two instances of the same word are counted twice)
col H WordTypesRaw raw number of word types in text (two instances of the same word are counted once)
col I Sentences number of sentences in text
col J CharsOC like CharsRaw, but based only on open-class items (noun, verbs, and adjectives)
col K WordTokensOC like WordTokensRaw, but based only on open-class items
col L WordTypesOC like WordTypesRaw, but based only on open-class items
col M DateOfBirth date of author's birth (- if unknown)
col N DateOfComp date of composition of text (- if unknown)
col O Genre text genre
col P Sex author's sex (m by default)