Linguistics 300, F13, Assignment 3


As discussed in Coded corpora, CorpusSearch coding queries encode the results of many ordinary CorpusSearch queries as a single string of characters. For this assignment, you will download a file containing coding strings from coded versions of the Penn Parsed Corpora of Historical English and the Parsed Corpus of Early English Correspondence. The meaning of the various symbols is described in the coding conventions. In order to analyze the coding strings, you will be using a Linux/Unix command called grep (an acronym based on 'global regular expression print'), and not surprisingly, you will need to learn a bit about regular expressions in order to use it. The tutorial on grep provides a summary of the regular expressions that you will use.

As you'll soon see, the grep searches that you need for this and following assignments are quite repetitive. In order to save time and to avoid making typos in your command-line input, it is very convenient to run the searches in batches rather than on a one-by-one basis. The tutorial on shell scripts provides you with a sample script called batchGrep that you can edit for the first part of the assignment below. In order to complete the second part, you will need to edit the sample script called spreadsheetGenerator.


Assignment


Troubleshooting