"Labels" are the all upper-case tags inserted by the linguists who prepared the corpus (e.g., "IP", "CONJ", "N".) "Words" refers to the mostly lower-case original words of text (e.g. "so", "hit".) Every node in the sentence-tree has a label, and the leaf nodes also have words. CorpusSearch can conduct searches on labels or words. In practice, the vast majority of searches look for labels only.
CorpusSearch uses case-sensitive character-by-character string matching to match search-function arguments to strings found in the input. Therefore, spelling and upper-case/lower-case variations must be described explicitly (usually with an argument list.) For instance, this query searches for a complementizer whose associated text is "that" or "That":
(C iDominates that|That)
and finds sentences such as this:
/~*
and he shalle do yow remedy, that youre herte shal be pleasyd. '
(CMMALORY,3.47)
*~/
/*
12 CP-ADV: 13 C that
*/
(NODE
(12 CP-ADV (13 C that)
(14 IP-SUB
(15 NP-SBJ (16 PRO$ youre) (17 N herte))
(18 MD shal)
(19 BE be)
(20 VAN pleasyd)))
(ID CMMALORY,3.47))
For the purposes of dominance, a words and its associated node label are considered separate objects. Thus, in the sentence below, "PRO" dominates "hit". For the purposes of precedence, a word and its associated label are considered to be one object. Thus, "that" sister-precedes "rocche" in this sentence, because the labels associated with "that" and "rocche" are sisters.
/~*
and so hit londid undir that rocche.
(CMMALORY,667.4861)
*~/
/*
1 IP-MAT: 11 D that, 12 N rocche
*/
(0
(1 IP-MAT (2 CONJ and)
(3 ADVP (4 ADV so))
(5 NP-SBJ (6 PRO hit))
(7 VBD londid)
(8 PP (9 P undir)
(10 NP (11 D that) (12 N rocche))
(13 E_S .))
(ID CMMALORY,667.4861))
![]() |
Definition Files Table of Contents |