Like the PPCEME, the PPCMBE spans roughly 210 years (1700-1914) and can thus be divided into three 70-year time periods analogous to the e1, e2, and e3 time periods of the PPCEME. Table 1 contains a wordcount summary by time period. All wordcounts exclude punctuation and extralinguistic material such as page numbers or token ID numbers.
Table 1: Wordcount summary by time period Period Wordcount 1700-1769 298,764 1770-1839 368,804 1840-1914 281,327 Total 948,895
Table 2: Wordcount summary by text genre Text genre Number of words Percentage Bible 52,909 5.6% Biography, autobiography 25,880 2.7% Biography, other 30,072 3.2% Diary 69,584 7.3% Drama, comedy 70,338 7.4% Educational treatise 64,839 6.8% Fiction 65,626 6.9% Handbook, other 63,557 6.7% History 61,621 6.5% Law 65,748 6.9% Letters, non-private 33,826 3.6% Letters, private 66,362 7.0% Philosophy 17,108 1.8% Proceedings, trials 58,973 6.2% Science, medicine 23,147 2.4% Science, other 53,449 5.6% Sermon 54,711 5.8% Travelogue 71,145 7.5% Total 948,895 100%
Finally,
Table 3: Wordcount summary by individual text Text Date Genre Wordcount albin-1736 1736 SCIENCE_OTHER 8,837 anon-1711 1711 EDUC_TREATISE 6,092 austen-180x 1805-1808 LETTERS_PRIV 9,650 bain-1878 1878 EDUC_TREATISE 9,095 barclay-1743 1743 EDUC_TREATISE 9,422 bardsley-1807 1807 SCIENCE_MEDICINE 7,694 benson-1908 1908 EDUC_TREATISE 9,042 benson-190x 1905-1906 DIARY 9,986 boethja-1897 1897 PHILOSOPHY 7,935 boethri-1785 1785 PHILOSOPHY 9,173 boswell-1776 1776 DIARY 9,887 bradley-1905 1905 TRAVELOGUE 10,292 brightland-1711 1711 EDUC_TREATISE 1,341 brougham-1861 1861 DRAMA_COMEDY 10,049 burton-1762 1762 SERMON 9,110 butler-1726 1726 SERMON 9,099 carlyle-1835 1835 LETTERS_PRIV 9,343 carlyle-1837 1837 HISTORY 8,752 chapman-1774 1774 EDUC_TREATISE 9,027 cibber-1740 1740 BIOGRAPHY_AUTO 10,046 collier-1835 1835 DRAMA_COMEDY 9,459 colman-1805 1805 DRAMA_COMEDY 10,161 cook-1776 1776 TRAVELOGUE 10,148 cooke-1712 1712 TRAVELOGUE 10,027 davys-1716 1716 DRAMA_COMEDY 10,294 defoe-1719 1719 FICTION 9,378 dickens-1837 1837 FICTION 9,437 doddridge-1747 1747 BIOGRAPHY_OTHER 10,432 drummond-1718 1718 HANDBOOK_OTHER 7,905 erv-new-1881 1881 BIBLE 10,964 erv-old-1885 1885 BIBLE 10,292 faraday-1859 1859 SCIENCE_OTHER 8,821 fayrer-1900 1900 BIOGRAPHY_AUTO 7,754 fielding-1749 1749 FICTION 9,385 fleming-1886 1886 HANDBOOK_OTHER 9,038 froude-1830 1830 SERMON 9,254 george-1763 1763 LETTERS_NON-PRIV 4,941 gibbon-1776 1776 HISTORY 8,804 gladstone-1873 1873 LETTERS_NON-PRIV 11,240 godwin-1805 1805 FICTION 9,343 goldsmith-1773 1773 DRAMA_COMEDY 10,385 grafting-1780 1780 HANDBOOK_OTHER 9,130 haydon-1808 1808 DIARY 10,015 herschel-1797 1797 SCIENCE_OTHER 9,156 hind-1707 1707 HISTORY 8,791 holmes-letters-1749 1749 LETTERS_NON-PRIV 6,535 holmes-trial-1749 1749 PROCEEDINGS_TRIAL 20,707 johnson-1775 1775 LETTERS_PRIV 9,525 kimber-1742 1742 HISTORY 8,829 lancaster-1806 1806 EDUC_TREATISE 9,214 lind-1753 1753 SCIENCE_MEDICINE 7,734 long-1866 1866 HISTORY 8,851 lyell-1830 1830 SCIENCE_OTHER 8,934 maxwell-1747 1747 HANDBOOK_OTHER 10,271 meredith-1895 1895 FICTION 9,322 montagu-1718 1718 LETTERS_PRIV 9,344 montefiore-1836 1836 TRAVELOGUE 10,195 newcome-new-1796 1796 BIBLE 11,033 nightingale-188x 1888-1889 LETTERS_PRIV 3,302 nightingale-189x 1890 LETTERS_PRIV 6,201 officer-1744 1744 TRAVELOGUE 10,032 okeeffe-1826 1826 BIOGRAPHY_AUTO 8,080 oman-1895 1895 HISTORY 8,851 poore-1876 1876 SCIENCE_MEDICINE 7,719 priestley-1769 1769 SCIENCE_OTHER 8,911 purver-new-1764 1764 BIBLE 11,099 purver-old-1764 1764 BIBLE 9,521 pusey-186x 1865-1866 SERMON 9,022 reade-1863 1863 TRAVELOGUE 10,369 reeve-1777 1777 FICTION 9,432 ruskin-1835 1835 DIARY 9,882 ryder-1716 1716 DIARY 9,916 skeavington-184x 184x HANDBOOK_OTHER 9,132 southey-1813 1813 BIOGRAPHY_OTHER 9,829 statutes-171x 1715-1716 LAW 9,315 statutes-1745 1745 LAW 9,320 statutes-1775 1775 LAW 9,436 statutes-1805 1805 LAW 9,440 statutes-1835 1835 LAW 9,370 statutes-1865 1865 LAW 9,456 statutes-1895 1895 LAW 9,411 stevens-1745 1745 DRAMA_COMEDY 10,277 strutt-1890 1890 SCIENCE_OTHER 8,790 talbot-1901 1901 SERMON 9,138 thring-187x 1870-1872 DIARY 9,997 tindall-1814 1814 HANDBOOK_OTHER 9,044 townley-1746 1746 PROCEEDINGS_TRIAL 9,995 trollope-1882 1882 BIOGRAPHY_OTHER 9,811 turner1-1799 1799 HISTORY 8,743 turner2-1800 1800 TRAVELOGUE 10,082 victoria-186x 1863-1865 LETTERS_PRIV 9,368 walpole-174x 1740-1747 LETTERS_PRIV 9,629 watson-1817 1817 PROCEEDINGS_TRIAL 28,271 weathers-1913 1913 HANDBOOK_OTHER 9,037 webster-1718 1718 EDUC_TREATISE 2,328 wellesley-1815 1815 LETTERS_NON-PRIV 11,110 wesley-174x 1744-1745 DIARY 9,901 whewell-1837 1837 EDUC_TREATISE 9,278 wilde-1895 1895 DRAMA_COMEDY 9,713 wollaston-1793 1793 SERMON 9,088 yonge-1865 1865 FICTION 9,329
The texts in the corpus are generally named after their author. Multiple authors with the same surname are distinguished by appending Arabic numerals to the name (for instance, "turner1" vs. "turner2"). In cases of anonymous or multiple authors, the name for the text is based either on the title of the work, as in the case of the English Revised Version ("erv"), the statutes ("statutes"), and a manual on grafting ("grafting"), or on the profession of the author ("officer"). In one case, we call the text "anon". Should the need arise for extended versions of the corpus, distinct anonymous authors would be identified as "anon2", "anon3", and so on.The filename for a text also includes the year of composition or publication. Texts that span several years within a decade contain "x" instead of a last digit. Texts from separate decades are given their own files. For instance, nightingale-188x and nightingale-189x contain Florence Nightingale's letters from the 1880s and 1890s, respectively.
Following the conventions of the Helsinki Corpus, authors are identified by name rather than by title. Sovereigns of England are identified by their given name. For instance, George III and Victoria are identified as "george" and "victoria". Members of the nobility or gentry are identified by their surname. For instance, John William Strutt, third Baron Rayleigh, and Arthur Wellesley, first duke of Wellington, appear in the corpus as "strutt" and "wellesley", even though they are better known as Lord Rayleigh and the Duke of Wellington.
The confusing issues regarding women's names that arose in the PPCEME do not arise in the present corpus.