As discussed in Coded corpora, CorpusSearch coding queries encode the results of many ordinary CorpusSearch queries as a single string of characters. For this assignment, you will download a file containing coding strings from coded versions of the Penn Parsed Corpora of Historical English and the Parsed Corpus of Early English Correspondence. The meaning of the various symbols is described in the coding conventions. In order to analyze the coding strings, you will be using a Linux/Unix command called grep (an acronym based on 'global regular expression print'), and not surprisingly, you will need to learn a bit about regular expressions in order to use it. The tutorial on grep provides a summary of the regular expressions that you will use.
As you'll soon see, the grep searches that you need for this and following assignments are quite repetitive. In order to save time and to avoid making typos in your command-line input, it is very convenient to run the searches in batches rather than on a one-by-one basis. The tutorial on shell scripts provides you with a sample script called batchGrep that you can edit for the first part of the assignment below. In order to complete the second part, you will need to edit the sample script called spreadsheetGenerator.
Based on the information in the coding conventions, come up with sanity checks of this sort and implement them in a shell script, with appropriate comments for each line, as in the examples below.
echo "negative declarative can't be a question and vice versa" grep -c "^[^_]:.:.:[^_]" ling300.cod.ooo echo "main verb BE in column 1 implies the same in column 2" grep -c "^B:[^b]" ling300.cod.oooSubmit the shell script, together with the results you obtain from running it. Your shell script should contain your last name (e.g., batchGrep-chomsky).
In replicating the results, make sure that your script is consistent with the following criteria:
echo $SHELL
If typing the simple filenames to run the shell scripts gives an
error message, chances are that the bash shell doesn't know
how or where to find your command. Help it out by prefacing the
command you are trying to run with . (which explicitly
tells the shell to look in the current directory) or
with bash.
batchGrep
try one of the following two options:
./batchGrep
bash batchGrep
There is no space in the first option, but there is a space
after bash in the second option.