How to Re-encode a Truetype Font as a Unicode Font


The situation not infrequently arises in which a font is available but does not have the desired encoding. With the increasing use of Unicode, we often want a Unicode encoding of a font originally created with some other encoding. Fonts can be re-encoded using pfaedit, a free outline font editor available here. The examples here were generated on a GNU/Linux system, but pfaedit can be used on POSIX-compliant systems and Mac OS X.

The first step is to open the font in pfaedit. An easy way to do this is to invoke pfaedit with the name of the font file as argument. pfaedit will display a map of the font, as in the window in the upper right in the screenshot below. One character is highlighted (has a dark background) because the left mouse button has been pressed while the cursor was over it. The window to its left is a page from the Unicode standard displayed by xpdf.



Right clicking brings up a menu from which char info should be selected.



This brings up the dialog box shown below. Notice the box labelled Unicode value, which currently contains the value U+0076. What we want to do is change this value.



Change the value by placing the cursor to the right of the current value, left-clicking, and backspacing to erase the current value. Then type the new value desired. The result is shown below. The new value is U+10030, which is the Unicode value for the selected Linear B character.



Next, left-click on the button Set from Value. In this case, pfaedit requests confirmation since it was designed to deal with the basic plane and the value entered lies outside it.



When you are finished with this character, close the dialog box by left-clicking on the OK button. In our example, this produces another request for confirmation. If you close the dialog by clicking on the Done button instead, your changes will not be saved.



This process is repeated for each character that you wish to assign to a new codepoint, which will usually be all of them. When you are done, it is time to generate a new font file.

If the characters you are encoding lie within Unicode plane 0 (that is, have codes less than or equal to 0xFFFF), you can generate the font immediately. However, since we have chosen an example in which we are using codepoints beyond the basic plane, we must first tell pfaedit about it. To do this, go to the Element menu and select Font Info. In the dialog box that pops up, left-click on the tab Encoding. You will see something like this:


Following the word Encoding are the words ISO 10646-1 (Unicode, BMP). This means that the encoding is for Unicode plane 0. If you click on the button containing these words, a menu like the following will appear:



As indicated, select ISO 10646-1 (Unicode, Full). Then exit the dialog box by clicking OK.

To generate a font file, click on the File menu and select Generate Fonts.



This will bring up the dialog box shown below. You can use the filename shown or enter another. If you have not made a backup of your font file, you should consider entering another name rather than over-writing your original font file. When you are ready, left-click on the Save button.



Note that the Save and Save As buttons in the File menu do not write out a new font file as you might expect. To create a new font file, you must use Generate Fonts as just explained. The Save and Save As buttons are used to write out font information in pfaedit's .sfd format. This is a human-readable text file. pfaedit can read an .sfd file and generate a font from it just as it can from an actual font file. .sfd files can also be edited with a text editor.

Editing the .sfd file provides another way of changing the encoding. The information about each character begins with StartChar. This is followed by a colon, a space, and then the name of the character. In the screenshot below we see the Linear B .sfd file being edited in emacs.



The line following StartChar is tagged Encoding. The second number on this line is the Unicode codepoint, expressed in decimal rather than the customary hexadecimal. Under UNIX, you can use bc to convert between decimal and hexadecimal. Here is a brief tutorial.