Annotation manual for the Penn Historical Corpora and the
York-Helsinki Corpus of Early English Correspondence
Beatrice Santorini
(April 2016)
This annotation manual is a revised version of the manual written in
connection with the first release of the Penn-Helsinki Parsed Corpus of
Early Modern English (Kroch, Santorini, and Delfs 2004). It is heavily
indebted to the annotation guidelines developed by Ann Taylor and Tony
Kroch for the second edition of the Penn-Helsinki Parsed Corpus of
Middle English (Kroch and Taylor 2000) as well as to the guidelines
developed for the Penn Treebank (Marcus, Santorini, and Marcinkiewicz
1993). The current version corrects typos and broken links as well as
adding some examples and clarifications; the substance of the guidelines
remains unchanged. The guidelines apply to the following corpora:
There are slight annotation
differences among the above-named corpora (notably, between the
PPCME2 and the later corpora).
Acknowledgments
I would like to thank the following institutions and individuals for
their support and assistance:
- The National Endowment for the Humanities for financial support under NEH Grant PA 23382-99.
- The National Science Foundation for financial support under NSF Grant BCS 99-05488.
- The National Science Foundation for financial support under NSF Grants BCS 05-08731 and BCS 11-47499.
- The users of the Penn Historical Corpora for their financial
support in purchasing the corpora.
- Tony Kroch and Ann Taylor for many helpful discussions concerning
the original guidelines for the PPCME2 and their adaptation to modern
English.
References
-
Kroch, Anthony, and Ann Taylor.
2000.
The Penn-Helsinki Parsed Corpus of Middle English (PPCME2).
Department of Linguistics, University of Pennsylvania.
CD-ROM, second edition, release 4
(http://www.ling.upenn.edu/ppche-release-2016/PPCME2-RELEASE-4).
-
Kroch, Anthony, Beatrice Santorini, and Lauren Delfs.
2004.
The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME).
Department of Linguistics, University of Pennsylvania.
CD-ROM, first edition, release 3
(http://www.ling.upenn.edu/ppche-release-2016/PPCEME-RELEASE-3).
-
Marcus, Mitchell, Beatrice Santorini, and Mary Ann Marcinkiewicz.
1993.
Building a large annotated corpus of English: The Penn Treebank.
Computational linguistics 19,
313-330.
Reprinted in
Susan Armstrong, ed., 1994,
Using large corpora.
Cambridge, MA:
MIT Press.
273-290.