Wordcounts for the individual text samples, along with date and genre information, are contained in the file WORDCOUNT-PPCEME in the current directory. The wordcounts exclude punctuation and extralinguistic material such as page numbers or token ID numbers.
The file is a text file that is suitable for importing into any
spreadsheet program; the field separator is the space character.
Conventions governing filenames
General conventions
As in the Helsinki Corpus, the filenames for the texts contain an
indication of the time period to which they belong.
See Philological
information for more details about the individual texts.
In addition, the filenames in the PPCEME contain an indication of which subcorpus they belong to.
A few examples:
In tripling the size of the samples from the Helsinki Corpus, we
have sometimes had to include texts by new authors (either because the
Helsinki Corpus sample for an author was itself already exhaustive, or
because we ran out of text in the course of tripling the sample size).
In what follows, we describe the conventions that we have followed in
assigning filenames to these new authors. Our general rule has been
to leave Helsinki Corpus filenames unchanged, but we have sometimes
slightly modified the original Helsinki
filenames for clarity and consistency. These modifications as well as
which PPCEME files supplement which Helsinki Corpus files are set out in
In the correspondence of important families (such as that of the
Barringtons, the Hattons, or the Plumptons), the Helsinki Corpus tends
to identify women by their birthname, and we retain those filenames.
So Anne Finch, countess of Nottingham, nee Hatton, is identified as
anhatton (not finch).
Where the Helsinki Corpus distinguishes individuals with the same
name by means of "jr" and "sr", we retain that usage. However, in
texts added at Penn, we do not specially distinguish the father-son
relationship. We follow this convention in order to avoid having to
change Helsinki filenames. So the Valentine Pettit of the Helsinki
Corpus is identified as pettit, and his son of the same name, who is
not represented in the Helsinki Corpus, is identified as pettit2.
For clarity and consistency, we have adopted the convention that
Arabic numbers immediately following an author's name always indicate
distinct authors. This forces us to modify Helsinki filenames
of the type mentioned in the preceding paragraph. hooker1 and
hooker2 become hooker-a and hooker-b. In the case of the wplumpt
and stat files, we distinguish the individual files by decade, as
described directly below.
Similarly, the wplumpt files mentioned earlier are identified as
wplumpt-1500, wplumpt-1510, and wplumpt-1530, and the statute files
appear as stat-1500, stat-1510, and so on.
In one or two cases, material by a single author spans more than
a decade, but we identify the samples in some other way than by
decade in order to avoid changing a Helsinki filename. For instance,
Brilliana Harley's letters to her husband (all but one of which are
part of the Helsinki Corpus) are identified as harley, and her
later letters to her son, which are not included in the Helsinki
Corpus, are identified as harleyedw. Thomas More's one personal
letter from the 1620s (included in the Helsinki Corpus) is
identified as morelet1, and his personal letters from the 1630s
(some from the Helsinki Corpus, and some added by us) are identified
as morelet2.
Name vs. title
Following the conventions of the Helsinki Corpus, authors are identified
by name rather than by title. Sovereigns of England are identified by
their given name. For instance, Charles II is identified as charles.
Other members of the nobility, including members of the royal family,
are identified by their surname. For instance, Thomas Howard, earl of
Surrey, 2nd duke of Norfolk, is identified as thoward (not norfolk), and
Mary Tudor (Henry VIII's sister, not to be confused with his daughter,
Mary I, who is not represented in the corpus) as mtudor.
In one or two cases, the Helsinki Corpus uses a title
rather than a surname as the basis for a filename. For instance,
Eleanor Clifford, countess of Cumberland, is identified as ecumberl
(not clifford). In such cases, we retain the Helsinki filename in
order to minimize confusion.
Women's names
As a general rule, women are identified by their surname at the time of
writing. Generally (though not always), this is a married name. In
order to minimize confusion, we do not change filenames to reflect a
later marriage. Two examples:
In one or two cases, a woman appears in the Helsinki
Corpus under her married name despite belonging to one of the
important correspondence families. For instance, Joan Everard and
Elizabeth Masham, both n&ecutee;e Barrington, are identified as
everard (not jobarring) and masham (not ebarring). In such cases,
we use the Helsinki filenames in order to minimize confusion.
Modifications of Helsinki Corpus filenames
Under certain circumstances, we have modified the filenames in the
Helsinki Corpus for clarity and consistency. The conventions governing
these modifications are given here, and the correspondence between the
old and new filenames are set out in
In one or two cases, it is not clear whether a
Helsinki Corpus file contains material spanning more than a decade
or not. In such cases, we have divided the file into the two
identifiable time chunks, but not identified them by decade.
For instance, tillots is divided into tillots-a (from before 1671)
and tillots-b (from 1679).
Table 1: Summary of filename modifications and PPCEME-Helsinki correspondences | ||
---|---|---|
Helsinki filename | PPCEME filename (if different from Helsinki) | Supplemented by |
alhatton | --- | alhatton2, ehatton2 |
bedyll | --- | friar, russell |
boyle | --- | boylecol |
clowes | --- | clowesobs |
conway | --- | rich |
counc | --- | dell |
ebeaum | --- | mtudor-1510, mtudor-1520 |
ecumberl | --- | manners, delapole |
ehatton | --- | mhatton, montague |
eliz1, eliz2 | included in eliz-1590 | eliz-1560, eliz-1570, eliz-1580 |
eoxinden | included in eoxinden-1660 | dering, eoxinden-1650, eoxinden-1680, jackson, zouch |
essex | --- | essexstate |
everard | --- | jubarring |
fhatton | --- | mhatton |
harley | --- | harleyedw |
henry1, henry2 | included in henry-1520 | henry-1530 |
hooker1 | included in hooker-a | --- |
hooker2 | included in hooker-b | --- |
hoxinden | hoxinden-1660 | hoxinden-1640, hoxinden-1650 |
jetaylor | --- | jetaylormeas |
jpinney | --- | southard, part of jopinney |
knyvett | included in knyvett-1620 | knyvett-1630 |
kscrope | kscrope-1530 | grey, kscrope-1580, mhoward |
lords | --- | interview, marches, surety |
morelet1, morelet2 | --- | part of mroper (see Remarks therein) |
mowntayne | --- | underhill |
nhadd | included in nhadd-1700 | nhadd-1710 |
osborne | --- | conway2 |
pettit | --- | pettit2 |
peyton | --- | moxinden |
Plumpton correspondence | --- | abott, apoole, epoole, gascoigne, gpoole, nevill, rplumpt2, savill |
proud | proud-1620 | proud-1630 |
raleigh | --- | judall |
rferrar | --- | part of nferrar |
rhaddsr | included in rhaddsr-1670 and rhaddsr-1700 | rhaddsr-1650, rhaddsr-1710 |
roxinden | included in roxinden-1620 | roxinden-1600, roxinden2 |
somers | --- | drummond |
stat3 | stat-1500, included in stat-1540; see info for stat-period1-e1 | stat-1510, stat-1530, stat-1550, stat-1560 |
stat4 | stat-1590, included in stat-1600; see info for stat-period2-e2 | stat-1570, stat-1580, stat-1620, stat-1640 |
stat7 | included in stat-1690; see info for stat-period3-e3 | stat-1660 |
stevenso | --- | part of udall |
strype | --- | joxinden |
thoward | --- | dacre |
throckm | --- | thoward2 |
tillots | divided into tillots-a, tillots-b | tillots-c |
torkingt | --- | chaplain |
trincoll | --- | hatcher, talbot |
tunstall | --- | ambass |
turner | --- | turnerherb |
wcecil | included in wcecil-1580 | wcecil-1560 |
wpaston2 | --- | joxinden |
wplumpt1 | wplumpt-1500 | --- |
wplumpt2 | wplumpt-1510 | --- |
wplumpt3 | wplumpt-1530 | --- |