Linguistics 001 Fall 2005 Homework 4 Due Mo 10/10
1. The first part of the assignment requires you to estimate the size of someone's "mental lexicon," by obtaining a sample of 100 words from a medium-sized English dictionary, and checking what fraction of that sample your subject knows.
Each time you access this link you'll get a random sample of 100 headwords from a recent Collegiate dictionary, sampled from a list with 78,984 entries overall. If you access this link instead, you'll get a plain text file from the same dictionary instead of an HTML page (for easier cutting and pasting into some word processors).
How to do it
If you are a native speaker of English, you can use yourself as the subject. If you are not, then find a friend or acquaintance who is a native English speaker, and get them to do it for you.
Print out your sample of 100 words. For each one, answer yes ("I definitely know this word"), no ("I have no clue whatsoever about this word"), or maybe ("I have some sort of idea about this word, and could try to use it, but I might be at least partly wrong").
Note that some of the "entries" are words with internal spaces, such as time capsule or press box. You can treat these just like the solid entries -- it's no harder to figure out if you know what Ponzi scheme means than it is to see if you know the meaning of androcentric or microcode.
Of course, it isn't easy in any case -- without looking at the whole dictionary entry, you may sometimes be unsure whether your belief about a word's meaning is in fact correct. In such cases, you can look the word up in a regular paper dictionary, or on-line here:
In many cases, with or without checking the word in the dictionary, it's clear that you have partial knowledge -- for instance, you might know that exocrine is the opposite of endocrine, and has something to do with hormones and stuff, but be a little vague on exactly what it means beyond that. In this sort of case, you should score the word as "maybe."
If you total your "yes", "maybe" and "no" items separately, you'll wind up with range of estimates, such as "I know between 65 (count of "yes" answers) and 78 (count of "yes" and "maybe" answers) of the items on my list."
Count up the answers in the three categories, and give the totals. What vocabulary size (relative to the contents of the overall dictionary) does this result suggest?
Note that you are not going to be graded on the size of your vocabulary! To get full credit, you just need to turn in your worksheet and the resulting counts. You will not get a higher grade if the answer is higher -- in fact we might get suspicious if the answer is too high. In order to match the estimated average word stock of U.S. high school graduates, you only need to "know" 51 of 100.
How to interpret your results
If you "know" 70 out of your sample of 100 entries, you can estimate that you know roughly 70/100 of the entire 78,984 entries, or 55,289. This is surely an underestimate of your "mental lexicon", since this wordlist doesn't include acronyms such as UN, ASAP, DWI, fubar; company names like IBM, Coca Cola, Microsoft; institutional names like Harvard, Penn, Louvre; place names like Philadelphia, Albany, Katmandu; proper names like Clinton, Elvis, Socrates; and so on. Among these categories, you surely know several thousands (probably tens of thousands) of additional items.
After everyone turns in the homework, we'll give you some graphs summarizing the answers from the class as a whole.
2. Here is the second part of the assignment (inspired by an exercise in Farmer & Demers' A Linguistics Workbook):
In Anthony Burgess' novel A Clockwork Orange, he invents a slang vocabulary based mostly on borrowings from Russian. In the early 1960s, when this novel was written, it was actually plausible that the direction of cultural influence might result in large-scale borrowings from Russian into English. If you're curious about why this might have been, read this.
A short passage from the beginning of Burgess' book is quoted below:
Match each of the words in boldface with one of the glosses given below.
You should be able to do this fairly easily without cheating by looking the words up in an online glossary of nadsat.
In each case, give a morphological analysis that breaks the word down into its parts (if any). For each part, indicate whether it is a root, a derivational affix, or an inflectional affix.
For example, the word "Baghdadis" would be analyzed as
Also in each case, describe briefly how the context gives you clues about the form and meaning of the word. For example, in figuring out the word "cistus", in the following sentence from Patrick O'Brian's Master and Commander
you might observe that its phrasal context shows that it is a noun ("a __ whose name Stephen did not know"), that it occurs in a list whose previous member is a "a few ... caper-bushes", and that in this situation it seems like a word for a type of plant, since it would work well to substitute things like "cactus" or "evergreen". And you would be right, as the OED would tell you -- cistus is
Finally, do you know any Russian? If so, (optionally) indicate what the Russian source is for (some of) the words in the paragraph above.