the Function-Calls Vector

contents of this chapter:

parsing the query
the function-calls vector
an example
building the function-calls vector

parsing the query

Before the query is interpreted, we must be sure it has the correct form, as described in "The Query Language". This is done by the query parser (Parse.java), which is a recursive descent parser. If the query does not parse, the search is aborted with an error message describing the exact point in the query where the parser choked.

the function-calls vector

The function-calls vector records all the information needed to make the appropriate search-function calls as described in the query, in the order that CorpusSearch will make the calls.

For instance, where the query contains

((Call 1) AND (Call 2))

the function-calls vector will contain

0.) (Call 1)
1.) (Call 2)
2.) AND

This is because the AND function takes as its input two result vectors, in this case

1.) the result vector output from Call 1
2.) the result vector output from Call 2

Similarly, where the query contains

(((Call 1) AND (Call 2)) AND (Call 3))

the function-calls vector will contain

0.) (Call 1)
1.) (Call 2)
2.) AND
3.) (Call 3)
4.) AND

Here, the input to the second call of AND is

1.) the result vector output from the first call of AND
2.) the result vector output from Call 3.
(For more about result vectors, see search functions.)

For every search function call appended after Call 3, the function-calls vector will add first (Call n) and then AND. In general, if the query is:

(...((Call 1) AND (Call 2)) AND (Call 3)) ... AND (Call n))

then the function-calls vector will be:

0.) (Call 1)
1.) (Call 2)
2.) AND
3.) (Call 3)
4.) AND
.
.
.
2n-3.) (Call n)
2n-2.) AND

an example

Suppose we have this query:

(( (1ADJP|ADVP-TMP iDominates ADV*|ADJ*)
AND
(ADV*|ADJ* dominates !*er))
AND
(2ADJP|ADVP-TMP iPrecedes NP-SBJ))

In English, this query would be: "find sentences that contain two different ADJP|ADVP-TMP nodes. One ADJP|ADVP-TMP node immediately dominates an ADV*|ADJ* node, which does not dominate a word ending in "er". The other ADJP|ADVP-TMP node immediately precedes an NP-SBJ node."

The query has the form mentioned above:

(((Call 1) AND (Call 2)) AND (Call 3))

Here's the function-calls vector CorpusSearch builds from the query:

0.) [iDominates, 18, [ADJP, ADVP-TMP], [ADV*, ADJ*]]
1.) [dominates_not_y, 17, [ADV*, ADJ*], [!*er]]
2.) [AND, 1, [[true, 1, 2]]]
3.) [iPrecedes, 9, [ADJP, ADVP-TMP], [NP-SBJ]]
4.) [AND, 1, [[false, 0, 4]]]

Here, the numbers following the search-function names, as in "iDominates, 18" have no intrinsic meaning. Each search-function is assigned a number simply so that they can be identified using a switch statement.

Notice that two of the arguments in the query, 1ADJP|ADVP-TMP and 2ADJP|ADVP-TMP, had prefix indices ("1" and "2"), showing that two different instances of ADJP|ADVP-TMP are required. On first glance, it may seem as though this information has been lost in the function-calls vector, since the prefix indices have been stripped from the arguments to the search functions. But in fact the information has been recorded in exactly the form that CorpusSearch will use it, namely in the last call to AND:

4.) [AND, 1, [[false, 0, 4]]]

The [false, 0, 4] vector that will be sent to the last call to AND requires that the index-0 argument (1ADJP|ADVP-TMP) and the index-4 argument (2ADJP|ADVP-TMP) must not coincide (for more, see and.)

building the function-calls vector

To build the function-calls vector, we have a function-calls vector (begun empty) and a call-stack (begun empty). When an open parenthesis is found, a new vector is declared and placed on the call-stack. The information about the next function call is read and recorded in the vector. When a close parenthesis is found, the call-stack is popped and the vector taken from it is put in the function-calls vector.

I'll walk through the algorithm for producing the function-calls vector shown above (for simplicity's sake, I'll abbreviate the search-function calls to "Call 1", "Call 2" and "Call 3"):

(((Call 1) AND (Call 2)) AND (Call 3))

previously read text: <empty>
current text: <empty>

<empty> <empty>
call stack function-calls vector

previously read text: <empty>
current text: (

[]
call stack function-calls vector

previously read text: (
current text: (

[]
[]
call stack function-calls vector

previously read text: ( (
current text: (

[]
[]
[]
call stack function-calls vector

previously read text: ( ( (
current text: Call 1

[Call 1]
[]
[]
call stack function-calls vector

previously read text: ( ( ( Call 1
current text: )

[]
[] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 )
current text: AND

[AND]
[] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND
current text: (

[]
[AND]
[] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND (
current text: Call 2

[Call 2]
[AND]
[] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2
current text: )

1.)
[AND] 1.) Call 2
[] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2 )
current text: )

2.) AND
1.) Call 2
[] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2 ) )
current text: AND

2.) AND
1.) Call 2
[AND] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND
current text: (

2.) AND
[] 1.) Call 2
[AND] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND (
current text: Call 3

2.) AND
[Call 3]1.) Call 2
[AND] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND ( Call 3
current text: )

3.) Call 3
2.) AND
1.) Call 2
[AND] 0.) Call 1
call stack function-calls vector

previously read text: ( ( ( Call 1 ) AND ( Call 2 ) ) AND ( Call 3 )
current text: )

4.) AND
3.) Call 3
2.) AND
1.) Call 2
0.) Call 1

and the function-calls vector is complete.