Lexicon mode reference

This document is a rewrite of documentation originally written by Ann Taylor.

Depending on your keyboard, META maps to ALT, Command or ESC.

Invoking lexicon-mode (back to top)

When editing POS-tagged files, we do not want to run the risk of changing the original text. For analyzing tagger errors, it is also convenient to leave the automatically assigned original tags intact (adding the correct tags to any incorrect ones rather than simply replacing the incorrect tags). Ordinary emacs does not meet these desiderata, but there is a special mode of emacs called lexicon-mode that does. In order to invoke lexicon-mode emacs, it is convenient to define (1) a directory variable for the directory where the relevant code is stored and emacs into lexicon-mode emacs and (2) an alias for the special emacs mode. To do this, add two commands to your ~/.cshrc file along the following lines. The lines beginning with pound signs are optional comments.
# directory variable for lexicon-mode code
setenv ELISP	'/Users/shelby/elisp/'
# alias for lexicon-mode emacs
alias lex-emacs	'emacs -font 8x16 -geometry 80x54 -l $ELISP/color.el -l $ELISP/legal-tags.el \!* &'

You will need to redefine the $ELISP directory in the various .el files. Find the relevant files by pasting the following command into your terminal window:

grep "/Users/beatrice/elisp" *.el | grep -v '^;'
Wherever it occurs, you'll need to replace the path
/Users/beatrice/elisp
with your own path (for instance, the path /Users/shelby/elisp from the example above).

You should also review $ELISP/legal-tags.el and make any necessary changes (in ordinary emacs) to accommodate your own tag set.

In order for lexicon-mode emacs to work properly, the file being corrected needs to have the extension .lex.

If lexicon-mode emacs calls up a window that is split in two, you can fix the problem by positioning your cursor in the desired window (the one with the text to be corrected) and saying C-x 1 (control-x one).

To avoid the problem entirely, add the following line to your .emacs file.

(setq inhibit-startup-message 1)

Correcting tags (back to top)

The lexicon-mode interface works on pretagged text in the format word/tag. Labels are added to the text, but the text itself cannot be changed, nor can the tags that are generated by the POS tagger. Incorrect tags are corrected by appending (rather than substituting) correct tags. Move through the file by pressing SPACE. This causes the cursor to jump from word to word. To correct a tag, position the cursor anywhere on the word or associated tag. Type the correct tag. What you type will appear in the command line at the bottom of the Emacs window. When typing, you can use lowercase letters; they will echo as uppercase in the command line. Then press SPACE (not, as you might expect, RETURN). The new tag is added to the end of the word/tag sequence and is set off by an asterisk. The cursor will jump to the end of the next word.

original: house/VB

corrected: house/VB*/N

If you type a sequence of illegal keystrokes, your input will not be added to the text, and the following message will appear in the command line:

Bad tag sequence

To get rid of the message and to continue tagging, type C-g, check the location of the cursor, and type a legal tag.

If you append an incorrect tag, you can remove it using BACKSPACE. Alternatively, you can append a further, correct tag.

Saving and restarting after saving (back to top)

Auto-save does not work reliably in lexicon-mode. It is therefore a good idea to save regularly, using the ordinary Emacs command C-x C-s. Saving kicks you out of lexicon-mode (though not out of emacs itself), and you will have to restart lexicon-mode after saving by typing META-R (that is, META-SHIFT-r).

Disregard any error message of the form:

Symbol's function definition is void: buffer-flush-undo

After saving and exiting your file, you can return to where you left off correcting by typing META-p. This must be the first command you type on reopening the file.

Adding and deleting comments (back to top)

Comments can be added in lexicon-mode by typing < (that is, SHIFT-,). This opens a pair of comment brackets of the following form and positions the cursor in the proper place for you to start typing your comment.

<+ +>

The content of comments (leaving the brackets) can be modified or deleted by using BACKSPACE.

Do not nest comments within comments, as it is not possible to delete nested comments.

If you want to entirely eliminate a comment (that is, the text along with its brackets), position the cursor anywhere inside the comment brackets and type META-: (that is, META-SHIFT-;).

In order to streamline the correction and review process for POS-tagged files, the PPCMBE project has decided to use the following codes in comments:

<+ ANS ... +> Answers to queries. These may change into REF.
<+ COM:sic +> Indicates that an odd-looking feature of the text should not be corrected and need not be rechecked. QTXT will sometimes turn into COM.
<+ EOS +> Indicates end of sentence without sentence-final punctuation.
<+ FIX +> Indicates material that needs to be fixed.
<+ NTS ... +> Private notes by taggers. Should eventually be eliminated.
<+ QPOS ... +> Query concerning proper POS tagging. Should eventually be eliminated.
<+ QTXT ... +> Query concerning the original text (possible typos, etc.). Should eventually be eliminated.
<+ REF ... +> Particularly useful answers to queries may turn into REF.

Comments are temporary notes and must be removed before the tagged text is sent to the parser.

Complex tags (back to top)

Words can be tagged with simple tags or with complex tags of the following sort: See Splitting and joining for details on when complex tags are linguistically appropriate.

What follows focuses on implementing complex tags in lexicon-mode.

If a word is incorrectly tagged with a complex tag, and the correct tag is a simple tag, lexicon-mode allows the correction, and there is no problem.

If a word is incorrectly tagged (whether with a simple or complex tag), and the correct tag is complex, you currently can't add tags joined by plus. However, you can add complex tags joined by pipe (shift of backslash). You need to use the shift key to get pipe (even though you don't need to use the shift key to capitalize the POS tags themselves). Any number of tags can be joined in this way.

Pipe is converted to plus in postprocessing.

Idling in lexicon-mode (back to top)

If you are interested in keeping track of the time it takes to annotate, you can notify the system if you want to interrupt an annotation session for more than a few minutes, by pressing TAB. The following message will appear in the command line of the window.

Begin idling

To resume annotating, type any key, and the following message will appear in the command line.

End idling

Time information is stored at the bottom of the file, but is not visible in lexicon-mode.

Troubleshooting (back to top)

Bad tag sequence (back to top)

Problem: I type a legal label (I checked; the label is in the appropriate elisp files), but I still get a "Bad tag sequence" error message.

Solution: The problem is likely to be one that one of the keys in the legal label is set to have a special meaning in parser-mode-hacks.el. Disable the appropriate lines in that file. For instance, if "Z" leads to the error message, disable


(define-key parser-mode-map "z" 'hack-tag-string-to-NP)
(define-key parser-mode-map "Z" 'hack-tag-string-to-NP)

Error message after restart (back to top)

Problem: I restart a file after saving, and there's an error message:
Symbol's function definition is void: buffer-flush-undo

Solution: Disregard the error message; it seems to be harmless.

File mode specification error (back to top)

Problem: I try to edit a file, but the minibuffer says:
File mode specification error: (error "File not in correct format. Do not edit.")

Solution: The tagger and parser correction modes depend on a "blurb" at the end of the file (the blurb is invisible in the correction modes, but is visible in ordinary emacs and for unix commands like tail). The error message is probably occurring because the file is missing the blurb. You can check by saying something like:

prompt> tail franklin.lex
If the result is just the end of your text, you need to add the blurb. You can do this by saying:
prompt> /home/migration/other/MIDENG/PPCMBE/ut/add-blurb franklin.lex

File is read-only when I open it (back to top)

Problem: I try to edit or save a file, but the minibuffer says. For instance:
Buffer is read-only: #<buffer franklin.lex>

Solution: Two things might be wrong. First, the "blurb" might be missing (see File mode specification error), but for some reason the error message isn't showing. Use the tail command to see if there is a blurb.

Second, for some reason, the permissions on the file might be set wrong. Outwit the system as follows. Exit the file, and then make a copy of it. Give the copy some mnemonic name - not something like TMP - and give it the extension .lex, or the correction software won't work.

prompt> cp franklin.lex franklin-x.lex
Now work on your copy of the troublesome file.

Send Beatrice a message about what you did, so that she can sort out the situation.

File is still read-only when I open it (back to top)

Problem: I did what you just said, but the file is still read-only.

Solution: The permissions of the original file were set in a very screwy way and are preserved in your copy. You can see the permissions by saying:

prompt> ls -l franklin-x.lex
What you want to see is 'rw-' (or 'rwx') in the first triplet after the leading hyphen. If you are the owner of the file, you can change the permissions of the copy using the chmod command. Enter:
prompt> chmod 660 franklin-x.lex
This sets the permissions of your copy of the troublesome file to read and write privileges for the owner (you) and the mideng-own group. For more details on chmod, enter:
prompt> man chmod

If you aren't the owner, send Beatrice a message describing the problem.

File is read-only after I save it (back to top)

Problem: Everything was fine. Then I saved the file, and now all of a sudden the file is read-only.

Solution: See Saving and restarting after saving.

Legal tag can't be added (back to top)

Problem: I'm trying to add a tag that I know is legal, but I can't.

Solution: For some reason, it is currently impossible to add certain legal tags (CODE, LS). Add a comment to the text, and send a mail message to Beatrice.

Stuck in minibuffer (back to top)

Problem: There's an error message in the minibuffer. I want to get back to the text, but I can't leave the minibuffer.

Solution: Type C-g, possibly more than once.

The mark is not set now (back to top)

Problem: I try to add a legal tag, and there's an error message in the minibuffer "The mark is not set now."

Solution: Check if the caps lock key is on. If so, turn it off.

If the problem is not due to the caps lock key, check your .emacs file. The file is in your home directory, and you can use emacs to edit it, just as you would any other text file. In order to list it, you need to say ls -a. If your .emacs file contains a command concerning transient mark mode being set, delete or disable the command by putting a semicolon at the beginning of the line with the command.

Two windows (back to top)

Problem: I call up the file with the tagged text, and it comes up, but there's a second unwanted window.

Solution: Position the cursor in the wanted window. Type C-x 1.

The permanent solution is to add the following line to your .emacs file.

(setq inhibit-startup-message 1)