Before the query is interpreted, we must be sure it has the correct form, as described in "The Query Language". This is done by the query parser (Parse.java), which is a recursive descent parser. If the query does not parse, the search is aborted with an error message describing the exact point in the query where the parser choked.
The function-calls vector records all the information needed to make the appropriate search-function calls as described in the query, in the order that CorpusSearch will make the calls.
For instance, where the query contains
the function-calls vector will contain
This is because the AND function takes as its input two result vectors, in this case
Similarly, where the query contains
the function-calls vector will contain
Here, the input to the second call of AND is
For every search function call appended after Call 3, the function-calls vector will add first (Call n) and then AND. In general, if the query is:
(...((Call 1) AND (Call 2)) AND (Call 3)) ... AND (Call n))
then the function-calls vector will be:
Suppose we have this query:
In English, this query would be: "find sentences that contain two different ADJP|ADVP-TMP nodes. One ADJP|ADVP-TMP node immediately dominates an ADV*|ADJ* node, which does not dominate a word ending in "er". The other ADJP|ADVP-TMP node immediately precedes an NP-SBJ node."
The query has the form mentioned above:
Here's the function-calls vector CorpusSearch builds from the query:
Here, the numbers following the search-function names, as in "iDominates, 18" have no intrinsic meaning. Each search-function is assigned a number simply so that they can be identified using a switch statement.
Notice that two of the arguments in the query, 1ADJP|ADVP-TMP and 2ADJP|ADVP-TMP, had prefix indices ("1" and "2"), showing that two different instances of ADJP|ADVP-TMP are required. On first glance, it may seem as though this information has been lost in the function-calls vector, since the prefix indices have been stripped from the arguments to the search functions. But in fact the information has been recorded in exactly the form that CorpusSearch will use it, namely in the last call to AND:
The [false, 0, 4] vector that will be sent to the last call to AND requires that the index-0 argument (1ADJP|ADVP-TMP) and the index-4 argument (2ADJP|ADVP-TMP) must not coincide (for more, see and.)
To build the function-calls vector, we have a function-calls vector (begun empty) and a call-stack (begun empty). When an open parenthesis is found, a new vector is declared and placed on the call-stack. The information about the next function call is read and recorded in the vector. When a close parenthesis is found, the call-stack is popped and the vector taken from it is put in the function-calls vector.
I'll walk through the algorithm for producing the function-calls vector shown above (for simplicity's sake, I'll abbreviate the search-function calls to "Call 1", "Call 2" and "Call 3"):
(((Call 1) AND (Call 2)) AND (Call 3))
previously read text: <empty>
current text: <empty>
| <empty> | <empty> |
| call stack | function-calls vector |
previously read text: <empty>
current text: (
| [] | |
| call stack | function-calls vector |
previously read text: (
current text: (
| [] | |
| [] | |
| call stack | function-calls vector |
previously read text: ( (
current text: (
| [] | |
| [] | |
| [] | |
| call stack | function-calls vector |
previously read text: ( ( (
current text: Call 1
| [Call 1] | |
| [] | |
| [] | |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1
current text: )
| [] | |
| [] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 )
current text: AND
| [AND] | |
| [] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND
current text: (
| [] | |
| [AND] | |
| [] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND (
current text: Call 2
| [Call 2] | |
| [AND] | |
| [] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2
current text: )
| 1.) | |
| [AND] | 1.) Call 2 |
| [] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2 )
current text: )
| 2.) AND | |
| 1.) Call 2 | |
| [] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2 ) )
current text: AND
| 2.) AND | |
| 1.) Call 2 | |
| [AND] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND
current text: (
| 2.) AND | |
| [] | 1.) Call 2 |
| [AND] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND (
current text: Call 3
| 2.) AND | |
| [Call 3] | 1.) Call 2 |
| [AND] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND ( Call 3
current text: )
| 3.) Call 3 | |
| 2.) AND | |
| 1.) Call 2 | |
| [AND] | 0.) Call 1 |
| call stack | function-calls vector |
previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND ( Call 3 )
current text: )
| 4.) AND | |
| 3.) Call 3 | |
| 2.) AND | |
| 1.) Call 2 | |
| 0.) Call 1 |
and the function-calls vector is complete.