Linguistics 300, F12, Assignment 3


Assignment

As discussed in
Coded corpora, CorpusSearch coding queries encode the results of many ordinary CorpusSearch queries as a single string of characters. For this assignment, you will download a file containing all of the coding strings from the coded versions of the Penn Parsed Corpus of Historical English and the Parsed Corpus of Early English Correspondence. The meaning of the various symbols is described in the coding conventions. In order to analyze the coding strings, you will be using a Linux/Unix command called grep (an acronym based on 'global regular expression print'), and not surprisingly, you will need to learn a bit about regular expressions in order to use it. The tutorial on grep provides a summary of the regular expressions that you will use.

Once you're somewhat familiar with the coding conventions and grep, start working on the exercises below.

As you'll soon see, the grep searches that you need for the above exercises are quite repetitive. In order to save time and to avoid making typos in your command-line input, it is very convenient to run the searches in batches rather than on a one-by-one basis. The tutorial on shell scripts provides you with two sample scripts. Once you understand how these scripts work, you will be able to edit them to complete the exercises.


Troubleshooting