CS reformat.q FILE.psd
mv FILE.psd.fmt FILE.psd
The script keeps a backup copy (FILE.psd.old).
The re-ref script assumes that the text contains page breaks of the following form..
( (CODE <pb%%%n="1"$$>))
Some texts (for instance, Roland) contain no page breaks, but rather laisse breaks.
( (CODE <laisse="1">))
For such texts, comment out the line for page breaks in the re-ref script and uncomment the appropriate line. Restore the script to the original form when done.
Of course, you can also create several different variants of the re-ref script (re-ref-pb, re-ref-laisse, etc.) for the various sorts of files.
grep -c '^( (' FILE.psd
More precisely, the last running number in the file should match the output of the grep command. All tokens are assigned a running number, but tokens that are exhaustively dominated by CODE don't get an ID node (since they would never be cited as examples). So if the last token in the file is exhaustively dominated by CODE, then the last ID number in the file won't match the output of the grep command. This isn't a problem, though, since it is easy to figure out what the last running number would be by looking at the last few tokens. For instance, in the following case, the grep command should return 7260.
( (IP-MAT (PP (P devant) (NP (DZ Nostre) (NCS Dame) (PP (P de@) (NP (D @l) (NPRS Val))))) (CODE
) (VJ est) (NP-SBJ (DZ ses) (NCS ostex)) (PP (Q tot) (P a) (NP (NCS estal))) (PONFP .)) (ID 1170-YVAIN,207.7257)) ( (CODE <$$p>)) ( (CODE <$$div>)) ( (CODE <$$back>))
The output is FILE.pos. As its name indicates, the psd-to-pos script deletes syntactic information, leaving only terminals and preterminals. In addition, the (tag word) format of the parsed file is switched to word/tag.