Penn Parsed Corpora of Historical Greek (PPCHiG)

I am currently in the process of constructing syntactically annotated corpora of various stages of the Greek language, modeled after the Penn Parsed Corpora of Historical English and utilizing text files from the Perseus Digital Library, which are licensed under a Creative Commons ShareAlike 3.0 license. These corpora will be publically available (eventually) and searchable using CorpusSearch.

Annotation Manual

The annotation manual for the Penn Parsed Corpora of Historical Greek (PPCHiG) is available here.


A tutorial briefly introducing the annotation system employed in the parsed corpora of historical Greek and the use of CorpusSearch 2 to query the corpora is available here.

Current Status

As of June 2013, I've released a public beta version of the parsed texts of Matthew and Mark from the Greek New Testament and an alpha version of the first three books of Herodotus' Histories. You can download them at my GitHub repository, either:

  1. by downloading just the parsed file, located in the PSD/ directory (right-click and Save Link As... to download)
  2. by downloading the whole repository as a zip file (which also includes various mostly Python-based annotation and corpus manipulation tools)
  3. or by cloning the repository, in which case you will easily be able to pull any further updates.

The Gortyn Law Code

For my final project for a seminar on Greek Dialects and Greek Historical Grammar at the Center for Hellenic Studies in Washington, D.C., I produced a morphology and syntax-focused commentary on the Gortyn Law Code of the 5th century B.C.E., one of the longest surviving Greek inscriptions. A partial version of the commentary is available for download.

In addition to producing the commentary, I also produced a syntactically annotated version of the text in the style of the PPCHiG, which is also available for download. I hope to release an XML version of the text including the syntactic and morphological markup from the commentary at some point. Until then, a basic HTML version of the text is available here.