Germanic Lexicon Project
Character encoding
Previous Up Next

Following are the project-internal character encoding standards for the text documents in the Germanic Lexicon Project.

The encoding scheme for the base documents has been informed by two considerations:

With these considerations in mind, the following scheme has been adopted:

Rules for novel entity names

If the character is atomic (i.e., cannot be decomposed into a base character plus diacritics), then we simply pick a suitable name which does not conflict with any existing standard character name, such as &hw; for the Gothic character. Hyphens may occur in an atomic entity name, as in the case of &s-tall;, &r-runic; and &dash-uncertain.

However, if the character is not atomic, then the entity name consists of the base character (or entity name for the base character in the case of non-ASCII base characters, such as æ for æ) followed by a list of diacritic names, separated by hyphens.

The order for the list of diacritics is as follows:

Thus, the imaginary character would have this encoding: &o-slash-long-tilde-acute-hook;.

The rules for the ordering of diacritics are essentially the same as the Unicode rules. The conversion of these entities to Unicode is transparent.

The list of valid diacritic names is as follows:

NameUnicode combining diacritic code
acute U0301
bar U0336
cedil U0327
circ U0302
dasia U0314 (Greek only)
diar U0308 (Greek only)
hook U0328
long U0304
ocomma U0315
odot U0307
ohook U0313
oring U030A
oxia U0301 (Greek only)
peri U0342 (Greek only)
psili U0313 (Greek only)
short U0306
slash U0338
tilde U0303
udot U0323
uml U0308
uring U0325
varia U0300 (Greek only)
ypo U037A (Greek only)

Most HTML/XML standard entities do not separate diacritics with hyphens (á, not &a-acute;. By contrast, the novel entities coined according to the scheme here do use hyphen separators (e.g. &a-circ-acute, not âacute). Legibility, particularly in the case of characters with multiple diacritics, is the reason for this variance.

Note that these standards are only followed in the base documents. In derived HTML documents, all pretense of consistency and elegance is dropped; the entities are simply mapped to whatever representations will allow the text to display reasonably correctly on a broad base of web browsers. Many of the less-supported characters are displayed in the derived HTML files by means of embedded image files.

Database of accepted entities

Following is the database of characters outside the ASCII range (whether encoded as ISO-8859-1 characters or as entities) which we recognize as valid within the base documents. Other entities are to be kicked out as errors, even if the entity is a standard one (for example, if the entity ¥ for the ¥ character occurs in one of our base documents, it is almost certainly an error, because it is very unlikely that the yen sign ¥ is actually to be found in a text on the early Germanic languages.) Additional characters will of course be added to the following list when the legitimate need to encode a previously unencountered character arises.

Character Entity name Unicode conversion Is entity name standard?
" " U0022 Y
& & U0026 Y
< &lt; U003C Y
> &gt; U003E Y
˜ &tilde; U007E N
§ &sect; U00A7 Y
« &laquo; U00AB Y
&para; U00B6 Y
» &raquo; U00BB Y
À &Agrave; U00C0 Y
Á &Aacute; U00C1 Y
 &Acirc; U00C2 Y
à &Atilde; U00C3 Y
Ä &Auml; U00C4 Y
Å &Aring; U00C5 Y
Æ &AElig; U00C6 Y
Ç &Ccedil; U00C7 Y
È &Egrave; U00C8 Y
É &Eacute; U00C9 Y
Ê &Ecirc; U00CA Y
Ë &Euml; U00CB Y
Ì &Igrave; U00CC Y
Í &Iacute; U00CD Y
Î &Icirc; U00CE Y
Ï &Iuml; U00CF Y
Ð &ETH; U00D0 Y
Ñ &Ntilde; U00D1 Y
Ò &Ograve; U00D2 Y
Ó &Oacute; U00D3 Y
Ô &Ocirc; U00D4 Y
Õ &Otilde; U00D5 Y
Ö &Ouml; U00D6 Y
Ø &Oslash; U00D8 Y
Ù &Ugrave; U00D9 Y
Ú &Uacute; U00DA Y
Û &Ucirc; U00DB Y
Ü &Uuml; U00DC Y
Ý &Yacute; U00DD Y
ß &szlig; U00DF Y
à &agrave; U00E0 Y
á &aacute; U00E1 Y
â &acirc; U00E2 Y
ã &atilde; U00E3 Y
ä &auml; U00E4 Y
å &aring; U00E5 Y
æ &aelig; U00E6 Y
ç &ccedil; U00E7 Y
è &egrave; U00E8 Y
é &eacute; U00E9 Y
ê &ecirc; U00EA Y
ë &euml; U00EB Y
ì &igrave; U00EC Y
í &iacute; U00ED Y
î &icirc; U00EE Y
ï &iuml; U00EF Y
&eth; U00F0 Y
ñ &ntilde; U00F1 Y
ò &ograve; U00F2 Y
ó &oacute; U00F3 Y
ô &ocirc; U00F4 Y
õ &otilde; U00F5 Y
ö &ouml; U00F6 Y
ø &oslash; U00F8 Y
ù &ugrave; U00F9 Y
ú &uacute; U00FA Y
û &ucirc; U00FB Y
ü &uuml; U00FC Y
ý &yacute; U00FD Y
&thorn; U00FE Y
ÿ &yuml; U00FF Y
&dash-uncertain; - N
&e-sub; - N
&u-super; - N
&aolig; - N
&aolig-acute; - N
&thorn-bar; - N
&a-acute-hook; U0061+U0301+U0328 N
&a-long-acute; U0061+U0304+U0301 N
&a-long-short; U0061+U0304+U0306 N
&a-odot-acute; U0061+U0307+U0301 N
&a-uml-circ; U0061+U0308+U0302 N
&a-ohook; U0061+U0313 N
&c-hachek-udot; U0063+U030C+U0323 N
&c-tilde; U0063+U0303 N
&c-udot; U0063+U0323 N
&e-acute-hook; U0065+U0301+U0328 N
&e-circ-acute; U0065+U0302+U0301 N
&e-tilde-hook; U0065+U0303+U0328 N
&e-long-short; U0065+U0304+U0306 N
&e-long-hook; U0065+U0304+U0328 N
&e-odot-acute; U0065+U0307+U0301 N
&e-odot-tilde; U0065+U0307+U0303 N
&e-uml-acute; U0065+U0308+U0301 N
&e-uml-tilde; U0065+U0308+U0303 N
&e-ohook; U0065+U0313 N
&g-ocomma; U0067+U0315 N
&i-circ-acute; U0069+U0302+U0301 N
&i-tilde-hook; U0069+U0303+U0328 N
&i-long-acute; U0069+U0304+U0301 N
&i-long-short; U0069+U0304+U0306 N
&i-oring; U0069+U030A N
&i-oring-acute; U0069+U030A+U0301 N
&i-oring-tilde; U0069+U030A+U0303 N
&i-nonsyllabic; U0069+U032F N
&k-circ; U006B+U0302 N
&k-ocomma; U006B+U0315 N
&l-tilde; U006C+U0303 N
&l-ocomma; U006C+U0315 N
&l-uring; U006C+U0325 N
&m-tilde; U006D+U0303 N
&m-uring; U006D+U0325 N
&n-ocomma; U006E+U0315 N
&n-uring; U006E+U0325 N
&o-acute-hook; U006F+U0301+U0328 N
&o-circ-acute; U006F+U0302+U0301 N
&o-circ-hook; U006F+U0302+U0328 N
&o-long-acute; U006F+U0304+U0301 N
&o-long-short; U006F+U0304+U0306 N
&o-uml-circ; U006F+U0308+U0302 N
&q-bar; U0071+U0336 N
&r-acute-udot; U0072+U0301+U0323 N
&r-tilde; U0072+U0303 N
&r-long; U0072+U0304 N
&r-uring; U0072+U0325 N
&s-ocomma; U0073+U0315 N
&t-ocomma; U0074+U0315 N
&u-circ-acute; U0075+U0302+U0301 N
&u-long-acute; U0075+U0304+U0301 N
&u-long-short; U0075+U0304+U0306 N
&u-odot; U0075+U0307 N
&u-uml-circ; U0075+U0308+U0302 N
&u-oring-acute; U0075+U030A+U0301 N
&u-oring-tilde; U0075+U030A+U0303 N
&u-nonsyllabic; U0075+U032F N
&v-long; U0076+U0304 N
&y-short; U0079+U0306 N
&z-odot; U007A+U0307 N
&O-slash-long; U00D8+U0304 N
&a-tilde; U00E3 N
&aelig-circ; U00E6+U0302 N
&o-slash-long; U00F8+U0304 N
&A-long; U0100 N
&a-long; U0101 N
&A-short; U0102 N
&a-short; U0103 N
&a-hook; U0105 N
&c-acute; U0107 N
&c-hachek; U010D N
&d-bar; U0111 N
&E-long; U0112 N
&e-long; U0113 N
&e-short; U0115 N
&e-odot; U0117 N
&e-hook; U0119 N
&e-hachek; U011B N
&g-circ; U011D N
&i-tilde; U0129 N
&I-long; U012A N
&i-long; U012B N
&i-short; U012D N
&i-hook; U012F N
&k-cedil; U0137 N
&l-bar; U0142 N
&n-acute; U0144 N
&O-long; U014C N
&o-long; U014D N
&o-short; U014F N
&OElig; U0152 Y
&oelig; U0153 Y
&oelig-acute; U0153+U0301 N
&r-hachek; U0159 N
&s-acute; U015B N
&s-hachek; U0161 N
&u-tilde; U0169 N
&U-long; U016A N
&u-long; U016B N
&u-short; U016D N
&u-oring; U016F N
&w-circ; U0175 N
&y-circ; U0177 N
&z-hachek; U017E N
&s-tall; U017F N
&b-bar; U0180 N
&hw; U0195 N
&wynn; U01BF N
&a-hachek; U01CE N
&u-hachek; U01D4 N
&aelig-long; U01E3 N
&O-hook; U01EA N
&o-hook; U01EB N
&j-hachek; U01F0 N
&g-acute; U01F5 N
&AElig-acute; U01FC N
&aelig-acute; U01FD N
&YOGH; U021C N
&yogh; U021D N
&z-tail; U0225 N
&a-odot; U0227 N
&Y-long; U0232 N
&y-long; U0233 N
&schwa; U0259 N
&r-runic; U0280 N
&Alpha; U0391 Y
&Beta; U0392 Y
&Gamma; U0393 Y
&Delta; U0394 Y
&Epsilon; U0395 Y
&Zeta; U0396 Y
&Eta; U0397 Y
&Theta; U0398 Y
&Iota; U0399 Y
&Kappa; U039A Y
&Lambda; U039B Y
&Mu; U039C Y
&Nu; U039D Y
&Xi; U039E Y
&Omicron; U039F Y
&Pi; U03A0 Y
&Rho; U03A1 Y
&Sigma; U03A3 Y
&Tau; U03A4 Y
&Upsilon; U03A5 Y
&Phi; U03A6 Y
&Chi; U03A7 Y
&Psi; U03A8 Y
&Omega; U03A9 Y
&alpha; U03B1 Y
&beta; U03B2 Y
&gamma; U03B3 Y
&delta; U03B4 Y
&epsilon; U03B5 Y
&epsilon-long; U03B5+U0304 N
&zeta; U03B6 Y
&eta; U03B7 Y
&theta; U03B8 Y
&iota; U03B9 Y
&kappa; U03BA Y
&lambda; U03BB Y
&mu; U03BC Y
&nu; U03BD Y
&xi; U03BE Y
&omicron; U03BF Y
&pi; U03C0 Y
&rho; U03C1 Y
&sigmaf; U03C2 Y
&sigma; U03C3 Y
&tau; U03C4 Y
&upsilon; U03C5 Y
&phi; U03C6 Y
&chi; U03C7 Y
&psi; U03C8 Y
&omega; U03C9 Y
&iota-diar; U03CA N
&left-half-ring; U0559 N
&d-udot; U1E0D N
&h-udot; U1E25 N
&l-udot; U1E37 N
&m-odot; U1E41 N
&m-udot; U1E43 N
&n-odot; U1E45 N
&n-udot; U1E47 N
&r-odot; U1E59 N
&r-udot; U1E5B N
&s-udot; U1E63 N
&t-udot; U1E6D N
&v-udot; U1E7F N
&a-udot; U1EA1 N
&a-circ-acute; U1EA5 N
&e-udot; U1EB9 N
&e-tilde; U1EBD N
&y-tilde; U1EF9 N
&alpha-psili; U1F00 N
&alpha-dasia; U1F01 N
&alpha-dasia-oxia; U1F04 N
&alpha-psili-oxia; U1F04 N
&Alpha-psili; U1F08 N
&Alpha-dasia; U1F09 N
&epsilon-psili; U1F10 N
&epsilon-dasia; U1F11 N
&epsilon-psili-oxia; U1F14 N
&epsilon-dasia-oxia; U1F15 N
&eta-psili; U1F20 N
&eta-psili-oxia; U1F24 N
&eta-dasia-oxia; U1F25 N
&eta-psili-peri; U1F26 N
&eta-dasia-peri; U1F27 N
&iota-psili; U1F30 N
&iota-dasia; U1F31 N
&iota-psili-oxia; U1F34 N
&iota-dasia-oxia; U1F35 N
&iota-psili-peri; U1F36 N
&iota-dasia-peri; U1F37 N
&omicron-psili; U1F40 N
&omicron-psili-peri; U1F40+U0342 N
&omicron-dasia; U1F41 N
&omicron-psili-oxia; U1F44 N
&omicron-dasia-oxia; U1F45 N
&upsilon-psili; U1F50 N
&upsilon-dasia; U1F51 N
&upsilon-psili-oxia; U1F54 N
&upsilon-dasia-oxia; U1F55 N
&upsilon-psili-peri; U1F56 N
&omega-psili; U1F60 N
&omega-psili-ypo; U1F60+U0345 N
&omega-psili-oxia; U1F64 N
&omega-dasia-oxia; U1F65 N
&omega-psili-peri; U1F66 N
&omega-dasia-peri; U1F67 N
&alpha-oxia; U1F71 N
&epsilon-oxia; U1F73 N
&eta-dasia; U1F74 N
&eta-oxia; U1F75 N
&iota-oxia; U1F77 N
&omicron-varia; U1F78 N
&omicron-oxia; U1F79 N
&upsilon-oxia; U1F7B N
&omega-oxia; U1F7D N
&alpha-long; U1FB1 N
&alpha-long-oxia; U1FB1+U0301 N
&alpha-long-psili-oxia; U1FB1+U0313+U0301 N
&alpha-peri; U1FB6 N
&eta-ypo; U1FC3 N
&eta-peri; U1FC6 N
&iota-long; U1FD1 N
&iota-long-oxia; U1FD1+U0301 N
&iota-long-psili; U1FD1+U0313 N
&iota-diar-oxia; U1FD3 N
&iota-peri; U1FD6 N
&iota-psili; U1FD6 N
&upsilon-long; U1FE1 N
&upsilon-long-oxia; U1FE1+U0301 N
&upsilon-long-dasia; U1FE1+U0314 N
&upsilon-diar-oxia; U1FE3 N
&rho-dasia; U1FE5 N
&upsilon-peri; U1FE6 N
&omega-ypo; U1FF3 N
&omega-peri; U1FF6 N
&omega-peri-ypo; U1FF7 N
&ndash; U2013 Y
&mdash; U2014 Y
&dash-acute; U2014+U0301 N
&highquote; U2018 N
&lowquote; U201A N
&bull; U2022 Y
&sup4; U2074 N

This page was last updated on 09 Jan 2006.

Germanic Lexicon Project main page