The following instructions are geared towards the search requirements for this course. Grep is line-based; since the coding strings are each on a separate line, grep searches each coding string separately, just as we want.
The general format of a grep command is:
The search pattern (but not the input files to be searched) needs to be enclosed in (single or double) quotes, like this:grep search-pattern file(s)-to-be-searched
grep "v:k:i:_:_:_:_:_:_:_:1680" ling300.cod.ooo
Given the coding conventions for the coding strings, the search string above matches the coding strings for negative declaratives with do support containing an intransitive verb from the know class written by an author born in 1680.
In addition to containing literal characters as above, search patterns in grep can also contain so-called regular expressions. The following table contains the regular expressions that you will need in order to search the coding strings for the syntax project.
| Regular expression | Explanation |
|---|---|
| . | Period stands for any single character (including itself) |
| [aeiouy] | Square brackets enclose alternatives. The expression to the left matches the set of English vowels. |
| [a-e] | For digits and letters, alternatives can be specified as ranges of characters. The expression on the left is another way of searching for [abcde]. |
| [0-9]
[a-z] [A-Z] | Commonly used alternatives can be specified as ranges of characters. The expressions on the left match, respectively, a single digit, a single lowercase letter, a single uppercase letter. |
| [0-9a-z]
[0-9a-zA-Z] [a-cg-im-os-t] | Ranges can be combined. The first expression matches a single digit
or lowercase letter. The second expression matches a single digit or
any letter regardless of case. As the third expression shows, the ranges
that are combined can be any well-formed range.
A hyphen right after the opening bracket or right before the closing bracket is interpreted literally. The following searches are therefore equivalent. grep '^[a-c-]' ling300.cod.ooo grep '^[abc-]' ling300.cod.ooo grep '^[-a-c]' ling300.cod.ooo grep '^[-abc]' ling300.cod.ooo |
| ^ | The caret character has two different meanings, depending on where
it occurs in a search string.
A caret as the first character of a search string "anchors" the search string to the beginning of an input line. In other words, there is a difference between the following two commands. The first command finds lines with D anywhere in the coding string. Given the coding conventions for the coding strings, this would match negative declaratives with main verb do, questions with main verb do, any coding string from a private diary, and a number of other sentence types - not a linguistically meaningful result! The second command finds lines with D as the first character on the input line. Given the coding conventions, this would match negative declaratives with main verb do.grep 'D' ling300.cod.ooo grep '^D' ling300.cod.ooo In order to find tokens from private diaries (regardless of their other properties), you'd say grep '^.:.:.:.:.:.:.:.:.:.:....:....:.:..:.:.::D' ling300.cod.ooo When a caret immediately follows a square bracket, it has an entirely different meaning. In that context, it negates the contents of the material in square brackets. For instance, given the coding conventions, all of the following searches are equivalent. grep '^[DHK]' ling300.cod.ooo grep '^[^BVbdhkv-]' ling300.cod.ooo grep '^[^BVa-z-]' ling300.cod.ooo |
| * | An asterisk after an expression indicates zero or more instances of that expression (that is, the optional occurrence of an expression). |
Once we have ascertained that there are no errors in the coding strings, we generally don't care about the strings themselves; we're just interested in the number of times that strings of a particular form occurs. In order to count matches, grep allows you to use a switch (= option) called -c, like this:
grep -c "v:k:i:_:_:_:_:_:_:_:1680" ling300.cod.ooo
The output of grep -c can be entered into spreadsheets for further quantitative analysis. Obviously, this can be done by hand, but don't do this, as it is both time-consuming and error-prone. Instead, see Shell scripts for saving searches for how to save your searches in a form that allows you to import the results into your spreadsheet program.