Minor categories

List of tags

CONJ coordinating conjunction
D determiner
FP focus particle
FW foreign word
NEG negation
NUM cardinal number (except ONE)
RP adverbial particle
X unknown POS
YY known but unimplementable POS

Adverbial particles (RP)

The criteria for distinguishing adverbial particles (RP) from other adverbs (ADV) are difficult to make explicit in every case. Following the Brown Corpus, we tag the following words RP when they do not take a complement.
ABOUT, ACROSS, BY, DOWN, FRO, IN, OFF, ON, OUT, OVER, THROUGH, TO, UP
and_CONJ rode_VBD on_RP more_QR than_P a_D paas_N

sir_NPR Ector_NPR assayed_VBD to_TO pulle_VB oute_RP the_D swerd_N

And_CONJ Sir_NPR Rauf_NPR of_P Beeston_NPR +gaf_VBD vp_RP the_D castel_N
to_P the_D Kyng_N

and_CONJ therwith_ADV+P he_PRO yelde_VBD up_RP the_D ghost_N

Items from the above list that modify a prepositional phrase continue to be tagged as particles as long as they are spelled as separate words (notably IN TO and UP ON, but not INTO, UPON or ADOWN, APON, UNTO).

out_RP of_P the_D way_N

me_MAN droh_VBD hire_PRO +tus_ADV in_RP to_P dorkest_ADJS wan_N

knelyng_VAG doun_RP oppon_P his_PRO$ knees_NS

oute_RP of_P the_D byggest_ADJS castell_N doune_RP to_P the_D erthe_N

+tai_PRO were_BED exilede_VAN oute_RP of_P Spaygne_NPR

&_CONJ earnin_VB him_PRO crune_N up_RP o_P crune_N

&_CONJ healden_VB hit_PRO se_ADVR wal_ADV hat_ADJ hehe_ADJ up_RP on_P
hire_PRO$ heaued_N

When an item from the above list combines with -WARD, it continues to be tagged RP.

Cardinal numbers except ONE (NUM)

When overtly marked for plural (DOZENS, SCORES, HUNDREDS, THOUSANDS, MILLIONS, etc.), number words are tagged NS. Often, such forms take PP complements (nine millions of subjects), which clearly shows the status of the number word as a nominal head. In a very few cases, where a plural number word is immediately followed by another number word and would in modern usage be replaced by the singular (as in nine millions three-hundred thousand), the number word is tagged as NUM.

Singular number words (DOZEN, SCORE, and sometimes HUNDRED, THOUSAND, MILLION) are treated as N when followed by a PP complement.

Otherwise, unless used as list markers (LS), all cardinal numbers except for ONE are tagged NUM, whether spelled out, in numeral form, or in some combination of the two.

This_D Joyous_ADJ trouth_N conteyneth_VBP in_P itself_PRO two_NUM
partyes_NS

+ter_EX schal_MD com_VB befor_ADV xv_NUM dayes_NS of_P gret_ADJ drede_N

and_CONJ all_D men_NS and_CONJ woymen_NS and_CONJ childyrne_NS schull_MD
aryse_VB vp_RP yn_P +te_D age_N of_P xxx=ti=_NUM +gere_N

In the unlemmatized corpora, compound numbers are treated as written. For AND in number sequences, see AND.

In the lemmatized corpora, compound numbers are joined together as single words; see the lemmatization guidelines for details.

For numbers in foreign language sequences, see foreign words.

DOUBLE, TREBLE, TRIPLE, etc., TWICE, THRICE, and ONCE when analogous in meaning to TWICE, THRICE are always tagged NUM.

double_NUM manere_N of_P money_N

+te_D sale_N of_P +tynges_NS was_BED of_P double_NUM price_N

and_CONJ twyse_NUM I_PRO smote_VBD hym_PRO downe_RP

his_PRO$ horse_N turned_VBD twyse_NUM abowte_RP

Cases where an ordinal might be expected but without overt ordinal marking are treated as cardinal numbers and tagged NUM.

The_D ij._NUM day_N
the_D .ix._NUM chapytre_N

Otherwise, ordinal numbers are tagged ADJ.

Complementizers (C)

THAT, +TE, and variants introducing any kind of subordinate clause are tagged C. See also AS (complementizer), AS, SO, THAN (preposition), IF.
and_CONJ sei+t_VBP +tat_C it_PRO was_BED ano+ter_D+OTHER body_N

and_CONJ was_BED i-schore_VAN monk_N in_P an_D abbay_N
+tat_C he_PRO hym_PRO self_N bulde_VBD

dohter_N he_PRO cleope+d_VBP hire_PRO ._, for-+ti_P+D
+tt_C ha_PRO understonde_VBP ._, +tt_C he_PRO hire_PRO
luueliche_ADJ liues_N$ luue_N leare+d_VBP ._, as_P feader_N
ah_MD his_PRO$ dohter_N ._.

Al_Q swo_ADV he_PRO de+d_DOP +to_D men_NS +de_C sennen_NS
habbe+d_HVP forhaten_VBN te_TO laten_VB

Coordinating conjunctions (CONJ)

The following items are tagged CONJ when used as coordinating conjunctions.
AND, NE, NEITHER, NOR, OR, OTHER (= OR)

It is possible for two coordinating conjunctions to be adjacent.

And_CONJ nor_CONJ is this the right answer .

But_CONJ neither_CONJ is this the right answer .

In instances of correlative conjunction, each of the correlative conjunctions is tagged CONJ.

BOTH ... AND, EITHER ... OR, (EITHER) +GE ... +GE, NEITHER ... NOR
both_CONJ you and_CONJ I

either_CONJ or_CONJ I

ai+der_CONJ +ge_CONJ hodede_VAN +ge_CONJ leawede_ADJ

neither_CONJ you nor_CONJ I

Determiners (D)

The following words are tagged D when used as determiners:
A(N), THAT, THE, THESE, THIS, THOSE, YON, YONDER

Demonstratives are always tagged D, regardless of whether they precede a noun. Note the difference between ordinary determiners and wh- words in this regard.

and_CONJ cristened_VBD hym_PRO at_P +te_D citee_N Dortik_NPR ,_, +tat_D
is_BEP Dorchestre_NPR

+tat_D is_BEP Friday_NPR

+Tis_D +tat_C is_BEP i-seide_VAN in_P +te_D comyn_ADJ table_N

For_FOR to_TO brynge_VB +tis_D aboute_RP Machometus_NPR norsched_VBD
and_CONJ fedde_VBD a_D faire_ADJ camel_N

For cases ambiguous between AN (D) and ONE (ONE), the default is D.

Focus particles (FP)

The following words, all of which also have other uses, are tagged FP when used as focus particles:
all periods ALONE, BUT, EVEN, ONLY
only Middle English FORTH, ONE, YET
Singuler_ADJ lufe_N es_BEP bot_FP of_P Jhesu_NPR Cryste_NPR alane_FP

+tan_ADV wil_MD +te_PRO liste_VB stele_VB by_P +te_PRO alane_FP

there_EX was_BED but_FP fewe_Q folk_NS at_P that_D tyme_N that_C
beleved_VBD perfitely_ADV

ye_PRO were_BED never_ADV but_FP my_PRO$ servaunte_N syn_P ye_PRO
resseyved_VBD the_D omayge_N of_P oure_PRO$ Lorde_N Jesu_NPR Cryste_NPR

they_PRO shold_MD come_VB by_P Crystmasse_NPR even_FP unto_P London_NPR

and_CONJ kut_VBD thorow_P the_D trappoure_N of_P stele_N and_CONJ the_D
horse_N evyn_FP in_P two_NUM pecis_NS

+tat_C hie_PRO ne_NEG biholden_VBP non_Q iuel_N ne_CONJ non_Q
un-nut_ADJ ne_CONJ for+den_FP idel_ADJ

hie_PRO bie+d_BEP ut-iworpen_VAN +durh_P dieules_NPR$ lare_N ,_,
naht_NEG for_P hem_PRO seluen_N ane_FP

+De_D mann_N ne_NEG leue+d_VBP naht_NEG $be_P {TEXT:he}_CODE
bread_N ane_FP

he_PRO axede_VBD no+ting_Q+N wi+t_P here_PRO ,_, but_P oneliche_FP
heir_PRO$ clo+ting_N and_CONJ oneliche_FP heir_PRO$ body_N

hwi_WADV wi+d_VBP21 dra+gest_VBP22 +tu_PRO +tin_PRO$ hont_N ._,
&_CONJ +get_FP +tin_PRO$ king_N hond_N of_P midde_N +tine_PRO$
bosme_N

Foreign words (FW)

If a potential foreign word has an entry in the OED, it is not tagged as FW. This is true even for nouns that retain a foreign plural.

Abbreviations of foreign terms do not count as words in the above sense and are tagged as FW. Some examples include:

B.A., e.g., etc., i.e., M.A.

Foreign names and certain common Latin liturgical texts are treated as proper nouns.

Foreign language titles are generally tagged FW.

In_P that_D Alcoranum_FW it_PRO is_BEP i-wrete_VAN

in_P the_D prologe_N on_P Regum_FW

iij._NUM stories_NS of_P the_D ij._NUM book_N of_P Paralypomynon_NPR
and_CONJ of_P Regum_FW

In foreign language sequences, everything (words, symbols, numbers, etc.) except punctuation is labelled FW.

libro_FW 5=o=_FW ,_, capitulo_FW 24=o=_FW ._.

Interjections (INTJ)

INTJ is only used to tag words that are difficult or impossible to tag any other way, like the following:
AH, ALAS, AMEN, AYE, FAREWELL, FIE, GAR (< God), GOOD-BYE, GRAMERCY, HA, HULLO, LA, LO, NAY, NO, OH, PARDEE, POOF, WASSAIL, WELAWEI, YEA, YES, WITECRIST
But_CONJ +te_D chanouns_NS of_P Dorchestre_NPR sei+t_VBP nay_INTJ

"_" Nay_INTJ ,_, "_" quo+t_VBD +te_D aungel_N ,_.

+Te_D kyng_N and_CONJ his_PRO$ fautoures_NS seide_VBD "_" +gis_INTJ al_Q
at_P +te_D fulle_ADJ ._. "_"

Loo_INTJ ,_, "_" quod_VBD Mahometus_NPR

Oo_INTJ Kyng_N of_P bliss_N

MARY, MARRY and spelling variants are tagged NPR at the word level, but surrounded by INTJP brackets at the clausal level. See Interjection phrase (INTJP).

NO is tagged INTJ only when parallel to YES.

PRAY is never tagged as an interjection.

Items like FORSOOTH or wh- words are not tagged at the POS level as INTJ, even when used as interjections. Their function is sometimes indicated at the phrasal level; see Interjection phrase (INTJP).

Negation (NEG)

The negative particles NE and NOT are tagged NEG, as are NO and NONE in WHETHER OR NOT clauses. NE is also used as a coordinating conjunction (CONJ), and NOT is also used as a quantifier (Q).
non_Q senne_N ne_NEG mai_MD bien_BE idon_DAN bute_P +durh_P
unhersumnesse_N

for_CONJ I_PRO wille_MD not_NEG be_BE long_ADJ behynde_ADV

hit_PRO ne_NEG derue+d_VBP ham_PRO nawt_NEG

me_MAN ne_NEG net_VBP me_PRO noht_NEG te_TO forsweri+gen_VB

wheither_WQ it_PRO oghte_MD nedes_N be_BE doon_DAN or_CONJ noon_NEG

wheither_WQ he_PRO wol_MD doon_DO or_CONJ no_NEG

When negation cliticizes to the beginning of verbs and modals, the resulting combination is treated as a compound.

When negation cliticizes to the end of modals, the resulting combination is split.

Unknown POS (X)

Words with unknown POS are tagged X.
(NP-OB1 (NUM C) (N myle)
        (X li))

Uninmplementable POS (YY)

Words where the the correct tag is clear, but unimplementable in lexicon mode, are tagged YY.