Search Functions

contents of this chapter:

  • an example of a sentence vector
  • input to search functions
  • output of search functions
  • general algorithm
  • an example of a sentence vector

    In this chapter, I'll use the vector form of this sentence:

    Thenne quene Igrayne waxid dayly gretter and gretter.

    This sentence was first described in Handling Input Sentences. Here it is in vector form:

    [[0],
    [[1, IP-MAT],
    [[2, ADVP-TMP],
    [[3, ADV, Thenne]]],
    [[4, NP-SBJ],
    [[5, NPR, quene]],
    [[6, NPR, Igrayne]]],
    [[7, VBD, waxid]],
    [[8, ADVP-TMP],
    [[9, ADV, dayly]]],
    [[10, ADJP],
    [[11, ADJR, gretter]],
    [[12, CONJ, and]],
    [[13, ADJR, gretter]]],
    [[14, E_S, .]]],
    [[15, ID, CMMALORY,5.120]]]

    input to search functions

    In general, search functions take the following as their input:

    where the sentence vector is the current input sentence in vector form, the x_List is the first list of arguments in vector form, and the y_List is the second list of arguments in vector form.

    For example, if this is the search-function call:

    (ADJP|ADVP-TMP iDominates ADV*|ADJ*)

    then these items will be passed to the search function "iDominates" (immediately dominates):

    output of search functions

    The output of a search function is a "result vector", as described below.

    If a searched-for structure is found in an input sentence, the result vector records which nodes of the sentence make up the structure. Every node is a sub-vector of the sentence, either a phrase-node with this form:

    [index, label]

    for instance, [1, IP-MAT], or a word-node with this form:

    [index, label, text]

    for instance, [9, ADV, dayly].

    If an example of the searched-for structure is discovered in the input sentence, the structure is recorded in a vector of this form:

    1. boundary node
    2. x-node
    3. y-node

    Of course, there may be multiple instances of the searched-for structure in the input sentence. Every instance is represented by a structure-vector which is one sub-vector of the result vector.

    For example: I'll run this query

    (ADJP|ADVP-TMP iDominates ADV*|ADJ*)

    on the above sentence. This is the result vector:

    [[[1, IP-MAT], [2, ADVP-TMP], [3, ADV, Thenne]],
    [[1, IP-MAT], [8, ADVP-TMP], [9, ADV, dayly]],
    [[1, IP-MAT], [10, ADJP], [11, ADJR, gretter]],
    [[1, IP-MAT], [10, ADJP], [13, ADJR, gretter]]

    The result vector is ready to be input to the AND function or to various printing functions.

    general algorithm

    It is beyond the scope of this thesis to describe the algorithm of every search function. In general, however, the search functions traverse the sentence vector using the traverse algorithm outlined in "Handling Input Sentences." On the way, they look for sub-vectors whose first element is a node boundary, as described in the "node: " line of the command file. Searches for linguistic structures are then conducted within this sub-vector, which represents a sub-tree whose root is the node boundary. When an entire structure is found, the information representing it is added to the results vector.