Germanic Lexicon Project
Previous Up Next
Following is the project-internal standard for filenames. If you want to submit a scanned text to this collection, it would be very helpful if your filenames followed these guidelines.
Anatomy of a filename
Suppose that page xii of a book is scanned into PNG format. This file would be named a0012.png. Here is the system:
Letter a is for Roman-numeraled introduction pages (i, ii, iii, iv...).
Letter b is for the main body of the book (1, 2, 3, 4...).
Sections c and d are the introduction and main body of volume II, if this exists.
This is the page number fron the book.
Roman numerals are converted to Arabic numerals (so xii would be written 12).
The page number is padded with initial zeroes to make it exactly four digits long.
The project uses .png, .tiff, .html, .pdf, and .txt formats for individual pages.
Why pad the page number with zeroes?
What the computer usually
gives you instead:
Adding initial zeroes
forces the correct order:
The rationale to this whole numbering scheme is this: the computer should list the pages in the correct order, exactly as they appear in the original paper book. This makes it MUCH easier to manage and use the online files.
Many texts could get by with three digits (000-999), but a few require four. It is convenient for automated processing if the number of digits is uniform across texts, so we go ahead and use four digits for all texts.
Roman numerals are converted to Arabic numerals, once again to force the correct order. Putting Roman numerals into alphabetical order would give the wrong order.
Indicating which text a page belongs to
Some files have two extra letters at the beginning to show which text they belong to. For example, bt_b1033.pdf is page 1033 of Bosworth/Toller, first volume. At this writing, the only codes in use are bt (Bosworth/Toller), cv (Cleasby/Vigfusson), and tp (Fick/Falk/Torp).