Annotation manual for the Penn Parsed Corpora of Historical English and the Parsed Corpus of Early English Correspondence 2

Beatrice Santorini
(January 2022)

This annotation manual is the latest revision of previous versions (2004, 2016). It is heavily indebted to the original document developed by Ann Taylor and Anthony Kroch for Middle English (Kroch and Taylor 2000) as well as to the spirit of the guidelines for the Penn Treebank (Marcus, Santorini, and Marcinkiewicz 1993).

The substance of the guidelines remains largely unchanged, but the annotation scheme has been streamlined (see Changes), and the differences between the annotation of Middle English and (Early) Modern English have been reduced. Certain differences remain, however, which are occasioned by the syntactic differences between Middle English and later stages of the language. Except where necessary, the examples in the body of the manual are from (Early) Modern English.

This version of the manual also contains guidelines concerning lemmatization, which has been carried out for PPCEME and PPCMBE2 using the Oxford English Dictionary (OED) as a lemma authority.

The present guidelines are in force for the following corpora:

The guidelines were developed for English, but they have been used as a foundation for annotation guidelines for parsed corpora of various other Germanic and Romance languages. The general idea is that the present guidelines apply as a default except where overruled by language-particular considerations that are set out in a corpus-particular manual.

Suggestions for improvement may be sent to Beatrice Santorini (beatrice DOT santorini AT gmail DOT com).

Acknowledgments

Thanks are due to the following institutions and individuals for support and assistance:

References