Corpus description

General information

The PPCME2 text samples are based largely on the Middle English section of the Diachronic Part of the Helsinki Corpus of English Texts (available from ICAME), with certain additions and deletions. However, the size of the samples is considerably larger. For the earliest Helsinki time period, all texts are exhaustively sampled. For later Helsinki time periods, two texts per period were expanded to 50,000 words. The remaining texts are represented by the Helsinki Corpus sample.

The main Helsinki time periods are M1-M4, each covering approximately one hundred years. In addition, texts originally written in a given period but for which the earliest manuscript is from a later period are given two digit period designations. Table 1 is a list of all Helsinki periods as they appear in the corpus file names.

Table 1: Helsinki periods
Period designation Composition date Manuscript date
MX1 unknown 1150-1250
M1 1150-1250 1150-1250
M2 1250-1350 1250-1350
M23 1250-1350 1350-1420
M24 1250-1350 1420-1500
M3 1350-1420 1350-1420
M34 1350-1420 1420-1500
MX4 unknown 1420-1500
M4 1420-1500 1420-1500

The current edition of the PPCME2 includes a total of roughly 1.2 million words of running text. It comprises 55 text samples, each of which is given in three forms: a text file, a part-of-speech tagged file and a parsed file. In addition, there is a file with philological and bibliographical information about each text.

Wordcount information

Table 2 gives the number of words by Helsinki time period. The wordcounts exclude punctuation and extralinguistic material such as page numbers or token ID numbers.

Table 2: Wordcount summary by Helsinki period
Period Wordcount
MX1     62,596
M1    195,494
M2     93,999
M23     17,013
M24     35,591
M3   385,994
M34     99,994
MX4      5,168
M4   260,116
Total 1,155,965

Wordcounts for the individual files are given in Table 3. The information in the table is also contained in the file WORDCOUNT-PPCME2 in the current directory. The file is suitable for importing into a spreadsheet program; the record separator is the space character.

Table 3: Wordcount summary by individual text
Text Date Genre Wordcount
cmaelr3.m23 c1400 RULE 17,013
cmaelr4.m4 a1450 RELIG_TREATISE 11,181
cmancriw-1.m1 c1230 RELIG_TREATISE 48,566
cmancriw-2.m1 c1230 RELIG_TREATISE 15,224
cmastro.m3 a1450_c1391 HANDBOOK_ASTRO 6,897
cmayenbi.m2 1340 RELIG_TREATISE 45,944
cmbenrul.m3 a1425 RULE 18,221
cmboeth.m3 ?a1425_c1380 PHILOSOPHY 10,415
cmbrut3.m3 c1400 HISTORY 49,099
cmcapchr.m4 a1464 HISTORY 52,716
cmcapser.m4 c1452 SERMON 1,469
cmcloud.m3 a1425_?a1400 RELIG_TREATISE 15,599
cmctmeli.m3 c1390 PHILOSOPHY/FICTION 17,005
cmctpars.m3 c1390 RELIG_TREATISE 30,416
cmearlps.m2 c1350 BIBLE 44,521
cmedmund.m4 c1450_1438 BIOGRAPHY_LIFE_OF_SAINT 3,847
cmedthor.m34 c1440_?1350 RELIG_TRREATISE 13,949
cmedvern.m3 c1390 RELIG_TREATISE 12,843
cmequato.m3 c1392 HANDBOOK_ASTRO 6,261
cmfitzja.m4 ?1495 SERMON 5,652
cmgaytry.m34 c1440 SERMON 5,238
cmgregor.m4 c1475 HISTORY 37,326
cmhali.m1 c1225_?c1200 RELIG_TREATISE 8,495
cmhilton.m34 a1450_a1396 RELIG_TREATISE 4,963
cmhorses.m3 a1450 HANDBOOK_MEDICINE 5,902
cminnoce.m4 1497 SERMON 4,329
cmjulia.m1 c1225_?c1200 BIOGRAPHY_LIFE_OF_SAINT 6,810
cmjulnor.m34 c1450_c1400 RELIG_TREATISE 5,004
cmkathe.m1 c1225_?c1200 BIOGRAPHY_LIFE_OF_SAINT 8,699
cmkempe.m4 c1450 RELIG_TREATISE 60,212
cmkentho.m1 a1150_c1125 HOMILY 4,048
cmkentse.m2 c1275 HOMILY 3,534
cmlamb1.m1 a1225 HOMILY 6,459
cmlambx1.mx1 a1225 HOMILY 20,752
cmmalory.m4 a1470 ROMANCE 57,775
cmmandev.m3 ?a1425_c1400 TRAVELOGUE 49,690
cmmarga.m1 c1225_?c1200 BIOGRAPHY_LIFE_OF_SAINT 8,069
cmmirk.m34 a1500_a1415 SERMON 57,548
cmntest.m3 c1388 BIBLE 11,081
cmorm.m1 ?c1200 HOMILY_POETRY 50,579
cmotest.m3 a1425_a1382 BIBLE 10,015
cmpeterb.m1 c1150 HISTORY 6,757
cmpolych.m3 a1387 HISTORY 46,444
cmpurvey.m3 c1388 RELIG_TREATISE 39,704
cmreynar.m4 1481 FICTION 8,850
cmreynes.m4 1470-1500 HANDBOOK_OTHER 9,100
cmrollep.m24 a1450_?1348 RELIG_TREATISE 17,960
cmrolltr.m24 c1440_a1349 RELIG_TREATISE 17,631
cmroyal.m34 c1450_c1425 SERMON 6,231
cmsawles.m1 c1225_?c1200 HOMILY 4,111
cmsiege.m4 c1500 ROMANCE 7,659
cmthorn.mx4 c1440 HANDBOOK_MEDICINE 5,168
cmtrinit.mx1 a1225 HOMILY 41,844
cmvices1.m1 a1225_c1200 RELIG_TREATISE 27,677
cmvices4.m34 c1450_c1400 RELIG_TREATISE 7,061
cmwycser.m3 c1400 SERMON 56,402