Issues to Be Dealt with in Post-Processing
Punchlist
- edit queries in $DARPA/HOLDING
- mismatched indices with wh- traces and other empty categories
- Possessive pronouns should not be treated as heads of NPs.
??
- The treatment of comparative clauses may require re-evaluation.
Comparative clauses in Mark need operators and operator traces; the
other files should be checked, but ought to have most of the structures
correct.
probably best checked by hand (S-CMP, than, as ... as)
- The dash-tag -CLR is likely to have been both under- and over-applied.
cf. http:/www.ling.upenn.edu/~diertani/SupplementaryIndices.html#clrVerbs
missing-clr-about.q, missing-clr-to.q
needs more queries and queries need to be edited
run all missing-clr-*.q queries
then look for PP-CLR (without ZZZ)
these will either need to be added to the queries or overapplications of -CLR
- Check to be sure the empty category * is not being used incorrectly in
lieu of *PRO*, and vice versa.
- Check subjects of degree infinitives (i.e. complements of enough, too)
- The dash-tag -PRP is likely to have been underapplied.
- -DIR and -LOC have probably been confused with each other multiple
times, particularly PP-DIR and PP-LOC.
- Punctuation must be attached high.
- Codes must be attached high.
- The policies on the differences between RB and JJ are not
clear, but there may be sentence-level modifiers tagged JJ which should be RB.
- The following items need to be moved to the documentation.
- The individual components that comprise compound adjectives
(with or without hyphens) ought to be tagged as they would be tagged in
isolation, not simply JJ.
- The individual components of verb-particle modifiers should also be tagged as they would
be in isolation.
- The individual components of nominals, including
verb-particle nominals, are each tagged NN.
- The individual components of compound verbs (e.g. fine-tuning)
should be tagged as they would be in isolation.
- Adjectival NP heads should be tagged JJ if they are clearly adjectives.
- When ambiguous, heads of NPs should be tagged NN, and
modifiers should be tagged JJ.
- When there is ambiguity between JJ and VBG/VBN as the
correct tag, the default is JJ if it can be the complement of SEEM and/or
if it is gradable (modifiable by MORE, MOST, SO, TOO, and VERY).
Otherwise the correct tag is VBG/VBN.
Resolved
Syntactic issues - resolved (back to top)
- Check to be sure all (direct) questions have the correct number of VPs.
bad-vp-in-sq.q
- Clause-initial adjuncts belonging to the matrix clause are attached
high, as sisters to the subject and VP, rather than treated as topicalised
elements. Topicalisation is used only if the adjunct is clearly extracted
from an embedded clause.
bad-tpc.q
- Some complex NPs may lack the mandatory recursive structure.
missing-np-with-pp.q, missing-np-with-sbar.q
- The same may be true of other kinds of phrases (e.g. ADVPs and ADJPs).
missing-adjp-with-pp.q, missing-adjp-with-sbar.q
missing-advp-with-pp.q
- Postnominal parenthetical nouns should be tagged NP only, but do require
the requisite recursive structure.
sanity-check-other, missing-np-with-pp.q missing-np-with-sbar.q
- Post-VP modifiers should be attached low, as daughter to the lowest VP.
bad-high-vp-constituent.q
- The predication dash-tag -PRD is likely to have been underused.
missing-prd.q
- Traces should be located in the position in which they are assumed to
have originated, rather than first in the clause by default.
bad-initial-trace.q
- All imperative clauses (S-IMP) must have a *PRO* subject.
missing-subject.q
- Not all questions have the correct label on the S level.
missing-sq.q
- Some complement clauses may be missing the SBAR level.
missing-sbar.q
- All single-word interjection phrases receive the phrasal tag INTJ,
not INTJP.
bad-intjp.q
- There may be some confusion between ADVPs and PRTs.
bad-particle-1.q, bad-particle-2.q
- There is likely to be inconsistency in constructions involving a
series of particles/prepositions. These will sometimes be annotated as
a PRT projection followed by a PP, and sometimes as complex PPs.
bad-low-particle.q, z-about-etc.q, z-out-of.q
- Pre-verbal ADVPs should be attached high, as sister to VP, rather than
inside the relevant VP.
bad-low-advp.q
- Focus particles are attached low.
bad-high-fp.q
- Unlike the historical corpora, sequences of adpositions
(like in to or up to should be tagged as recursive PPs,
rather than tagging the first one as a particle.
bad-low-particle.q
- Eliminate all nodes labelled DELETE, along with all of their daughters.
post-correction-1, sanity-check-other
POS-tagging issues - resolved (back to top)
- All children of an ADVP should be tagged some flavour of RB.
bad-advp-child.q
- All empty categories ought to have a POS-tag -NONE-; they shouldn't be
only a syntactic tag plus terminal as in the historical corpora.
sanity-check-other
- Some imperatives have probably been tagged VBI, as per the historical
corpus. This should be changed to VB.
sanity-check-pos
- Eliminate X.
sanity-check-pos
- Eliminate HYPH?
sanity-check-pos
- about, when used adverbially (should be RB).
z-about-or-over.q
- according is tagged VBG even when head of PP.
z-according-etc.q, sanity-check-pos
- during is tagged IN.
z-during.q, sanity-check-pos
- including is tagged VBG even when head of PP.
z-according-etc.q, sanity-check-pos
- near should be IN when clearly heading a PP, but JJ
when used directionally.
z-near.q
- over, when used adverbially (should be IN).
z-about-etc.q
- that can be tagged DT (when a determiner), WDT (in
relative clauses), and IN (in CP-THTs, &c.), and there are likely to be
mistaggings of all combinations.
z-that-dt.q, z-that-wdt.q, z-that-in.q, z-that-rb.q, z-that-other.q, sanity-check-pos
- to can be tagged either IN (when a preposition) or TO
(with infinitives), and there are likely to be some mistaggings in either
direction.
z-to-TO.q, z-to-IN.q
DARPA conventions - resolved (back to top)