The second edition includes the 101 text samples from the first edition,
together with 174 further text samples. Errors in the samples from the
first edition that we have found ourselves or that have been reported to
us are corrected in this edition. As with the first edition, the genre
composition of the second edition has been kept as close as possible to
that of the Penn-Helsinki Parsed Corpus of Early Modern English
(PPCEME), except that we have added no further statutes to the second
edition. The samples from the first and the second editions are
distinguished by a trailing "-1" or "-2", respectively, in their
filenames.
Wordcounts for the individual text samples, along with date and
genre information, are contained in the file WORDCOUNT-PPCMBE2 in the current directory. The
wordcounts exclude punctuation and extralinguistic material such as page
numbers or token ID numbers.
The file is a text file that is suitable for importing into any
spreadsheet program; the field separator is the space character.
The filenames for the texts also include the year of composition or
publication. Texts that span several years within a decade contain "x"
instead of a last digit. Texts from separate decades are generally
given their own files. For instance, nightingale-188x and
nightingale-189x contain Florence Nightingale's letters from the 1880s
and 1890s, respectively. However, when the material from a decade is
not very extensive, it is subsumed in a file for a contiguous decade.
For instance, forster2-191x contains a few letters from 1909.
As noted above, the second edition of the PPCMBE contains more text
samples than the first edition. In some cases, the additional samples
extend the original sample by a given author; in other cases, the
additional samples are by authors not represented in the first edition.
In both cases, the filenames for the samples from the first edition
contain a trailing "-1"; the samples that have been added to the second
edition contain a trailing "-2".
The trial proceedings in frost-1840 were not available in time for the
first edition of the PPCMBE. The current sample is divided into two
parts. The first (frost-1840-1) is the sample that would have been
included in the first edition, had it been available. The second sample
(frost-1840-2) augments the first.
Wordcount information
Conventions governing filenames
General conventions
The texts in the corpus are generally named after their author.
Different authors with the same surname are distinguished by appending
Arabic numerals to the shared surname (for instance, "turner1",
"turner2", "turner3"). Works by authors whose names are not known are
either named for some salient feature of the text, such as the
profession of the author ("midshipman", "officer") or the title of the
work ("erv", "grafting", "statutes"), or they are named "anon1",
"anon2", etc.
Special cases
The sample of Queen Victoria's private letters from the first edition
has been renamed from "victoria-186x" to "victoria-186x-private-letters"
in order to explicitly distinguish it from the contemporaneous sample of
official letters added in the second edition.
Name vs. title
Following the conventions of the Helsinki Corpus, authors are identified
by name rather than by title. Sovereigns of England are identified by
their given name. For instance, George III and Victoria are identified
as "george" and "victoria". Members of the nobility or gentry are
identified by their surname. For instance, John William Strutt, third
Baron Rayleigh, and Arthur Wellesley, first duke of Wellington, appear
in the corpus as "strutt" and "wellesley", even though they are better
known as Lord Rayleigh and the Duke of Wellington.
Women's names
The confusing issues regarding women's names that arose in the PPCEME
do not arise in the present corpus.