This updated and augmented release adds two million words to the Modern British English corpus, for
a total of three million. It also includes a substantial number of
corrections to the other corpora in the series. In addition, a small number
of changes have been made to the annotation guidelines for the corpora. For
details see the online annotation manual.
PPCHE RELEASE NUMBER: 4
Penn Parsed Corpora of Historical English
The Penn Parsed Corpora of Historical English, including the Penn-Helsinki Parsed Corpus of Middle English, second edition (PPCME2), the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), and the Penn Parsed Corpus of Modern British English, second edition (PPCMBE2), are running texts and text samples of British English prose across its history - from the earliest Middle English documents up to the First World War. The texts come in three forms: simple text, part-of-speech tagged text and syntactically annotated text. The syntactic annotation (parsing) permits searching not only for words and word sequences, but also for syntactic structure. All of the annotation has been carefully reviewed by expert human annotators for accuracy and consistency. The corpora are designed for the use of students and scholars of the history of English, especially the historical syntax of the language, and they are publicly available to individuals, research groups and libraries.
The search program included with the Penn Parsed Corpora of Historical English, CorpusSearch2, was written by Beth Randall and has been released as open source software. The most current version is always downloadable from its Sourceforge project web site.