Text classification


The corpus is divided into four time periods according to the date of the manuscript:

Texts whose composition date and manuscript date belong to different periods are classified using two numbers. The first number indicates the period of composition, and the second, the period of the manuscript. "X" indicates an unknown date.

Each filename associated with a text in the corpus contains an extension indicating the text's period.

In the files that contain information about each text, the first date is the MED (Middle English Dictionary) date. More detailed date information from other sources may follow in parentheses.

When not exact, MED dates are given by quarter century. c (= circa) indicates a date preceding or following the given date by 25 years, a (= ante) indicates a date within the 25 years preceding the given date, and a question mark indicates doubtful or uncertain information.

Dates from Laing are probably more accurate than MED dates where they differ. When not exact, Laing dates are indicated by half or quarter century; a = first half, b = second half; a1 = first quarter, a2 = second quarter, etc. Thus, C12a1 indicates the first quarter of the 12th century (that is, 1100-1125). An asterisk preceding a Laing date indicates an OE source.


The texts are assigned to one of the five following dialect areas (following the classification of the Helsinki Corpus):

More precise dialect information from other sources may follow in parentheses.


The genres listed are according to the Helsinki Corpus classification.