Germanic Lexicon Project
Filenames
 Previous Up Next

Following is the project-internal standard for filenames. If you want to submit a scanned text to this collection, it would be very helpful if your filenames followed these guidelines.

Anatomy of a filename

Suppose that page xii of a book is scanned into PNG format. This file would be named a0012.png. Here is the system:

 Letter a is for Roman-numeraled introduction pages (i, ii, iii, iv...). Letter b is for the main body of the book (1, 2, 3, 4...). Sections c and d are the introduction and main body of volume II, if this exists. This is the page number fron the book. Roman numerals are converted to Arabic numerals (so xii would be written 12). The page number is padded with initial zeroes to make it exactly four digits long. The project uses .png, .tiff, .html, .pdf, and .txt formats for individual pages.

Why pad the page number with zeroes?

 The order you expected: (numeric order) 1 2 3 9 10 11 18 20 21 99 100 101 1000 What the computer usually gives you instead: (alphabetical order) 1 10 100 1000 101 11 18 2 20 21 3 9 99 Adding initial zeroes forces the correct order: 0001 0002 0003 0009 0010 0011 0018 0020 0021 0099 0100 0101 1000

The rationale to this whole numbering scheme is this: the computer should list the pages in the correct order, exactly as they appear in the original paper book. This makes it MUCH easier to manage and use the online files.

Many texts could get by with three digits (000-999), but a few require four. It is convenient for automated processing if the number of digits is uniform across texts, so we go ahead and use four digits for all texts.

Roman numerals are converted to Arabic numerals, once again to force the correct order. Putting Roman numerals into alphabetical order would give the wrong order.

Indicating which text a page belongs to

Some files have two extra letters at the beginning to show which text they belong to. For example, bt_b1033.pdf is page 1033 of Bosworth/Toller, first volume. At this writing, the only codes in use are bt (Bosworth/Toller), cv (Cleasby/Vigfusson), and tp (Fick/Falk/Torp).