Warner 2005:262-263 gives two tables (Tables 2 and 4), in which he
cross-tabulates Early Modern English texts from two time periods
(1500-1575 and 1600-1712) by lexical complexity and percentage of
auxiliary *do*. Your job in this assignment is to replicate
Warner's results on the basis of the texts in the PPCEME. The
spreadsheet containing the data for the assignment is here. The spreadsheet is a corrected version
of the one in Assignment 3; the column headings
are the same as in that assignment.

In particular, you will be filling in the cells of two tables of the following form:

Occurrence of do in texts of low versus high lexical
complexity xxxx-xxxx
| |||
---|---|---|---|

Texts of low lexical complexity | Texts of high lexical complexity | Total | |

High DO % | |||

Low DO % | |||

Total | |||

DO % average |

You will need to use your ingenuity to construct a measure of lexical complexity. Please begin by constructing a measure that is based on raw average word length and raw type/token ratio, as Warner does. In other words, use Columms F-H in the spreadsheet. Later on, if you want, you can experiment with the corresponding measures that are based on open-class items (Columns J-L). You can also experiment with incorporating sentence length (Column I) into your measure of lexical complexity.

Later on in the class, we will present a test that will allow you to evaluate the statistical significance of the various results that you obtain.

Together with the tables that your submit, please include a brief discussion of how you constructed the measure of lexical complexity and how you arrived at the figures in your tables, so that Caitlin and I can follow your reasoning.

In calculating the cells of the tables, please take the following points into account:

- The dates refer to date of composition, not date of birth.
- In calculating percentage of auxiliary
*do*, please use negative sentences (columns B and C), but not questions. - Certain texts from the PPCEME should be excluded from your analysis.
The texts in question are:
`authnew`and`authold`- The Authorized Version of the Bible of 1611, also known as the King James Bible, is essentially a lightly edited version of Tyndale's translation from the 1530's. The percentage of auxiliary*do*is therefore highly unrepresentative of the time of publication.`boethel`- Queen Elizabeth's translation of Boethius is often a word-for-word rendering of the original Latin, resulting in ungrammatical gibberish. In some cases, it is also clear that Elizabeth did not understand the meaning of the original.

- The PPCEME contains one or two texts past 1712. Include these in your second table, as there's no good reason to exclude them.