Using Regular Expressions in the Query Language

contents of this chapter:

apache regular expressions
your .cshrc file
your query file

apache regular expressions

When using regular expressions, CorpusSearch refers to a regular expression matcher provided by apache.

To learn more about the apache regular expression matcher, see:

apache regular expressions

your .cshrc file

These are the relevant lines in my .cshrc file:

setenv CLASSPATH /home/brandall/clean_search:/nldb5/tides/software/src/jakarta-regexp-1.1/jakarta-regexp-1.1.jar
setenv PATH /pkg/j/java-1.2ea6/bin:${PATH}
alias CS 'java -server CorpusSearch'

Notice that I have appended

:/nldb5/tides/software/src/jakarta-regexp-1.1.jar

to my CLASSPATH variable. This enables CorpusSearch to find the apache regular expression matcher.

My PATH variable refers to jdk 1.3. This is necessary, because this version of CorpusSearch uses parts of the java library that didn't exist in 1.2.

If I didn't have the last line, and instead ran CorpusSearch with "java CorpusSearch", my running time would be a lot slower. Using the -server option speeds up java 1.3.

your query file

To use regular expressions, put this line in your query file, before any instructions referring to labels or text (like node: or query:):

reg_exps: t

Then, write your query in the form of regular expressions. For instance, here's a query written using CorpusSearch's standard "StarMatch":

query:  (((IP-SUB* iDoms NP-SBJ*)
AND (NP-SBJ* precedes *MD|*HVP|*HVD))
AND (NP-SBJ* precedes VB|VAN|VBN))

Here's the same query using regular expressions:

query: (((/IP-SUB(.*)/ iDoms /NP-SBJ(.*)/)
AND (/NP-SBJ(.*)/ precedes /(.*)MD|(.*)HVP|(.*)HVD/))
AND (/NP-SBJ(.*)/ precedes /^VB$|^VAN$|^VBN$/))

Coding
Table of Contents