The most intimate and habitual things in life are sometimes hard to see objectively. For instance, it seems natural and inevitable for human communication to depend on spoken words, even though this requires tens of thousands of arbitrary connections between noises and concepts. It seems natural and inevitable for human children to learn these connections over the course of a decade or so, even though this requires analyzing the behavior of adults emitting concatenations of noises referring to logically-structured combinations of (often very abstract) properties of experiences. These facts are so obvious and familiar that it is not easy for us to appreciate what a strange and wonderful achievement they represent, and how far from obvious it is that this is how a system for communication among intelligent creatures should be designed.
Human vocal communication has some characteristics that are not logically necessary, and are none the less remarkable for being so ordinary that they are hardly ever noticed:
This situation might have been different. For instance, Cheney and Seyfarth (1990) indicate that the vervet monkey's repertory of meaningful cries is genetically fixed to a substantial degree, although the number of different classes of cries identified in this case is only about ten. Some philosophers of language have argued that there is a fixed stock of basic human concepts, and so our species might have saved us all from international communications barriers--not to speak of language requirements--by expanding a system like the vervet monkey's. We might have tried to get along with a few hundred such built-in vocal symbols, using with our ability to combine words into phrases whose meaning is determinable from the meaning of their parts, and our skill in pointing, making air pictures with our hands, and so on.
However, this is not the path that human evolution has taken. Instead, each human language develops its own rather large set of essentially arbitrary vocal signs, roughly what we normally call ``words.'' Exactly how large this set is depends on how you define its members: we do not need to count cats if we already have cat, but does building add to the count if build is already counted? what about red herring if we already have red and herring? The size of this set may also depend on the language (or perhaps the culture) examined. Nevertheless, if we define ``word'' as a piece of language whose conventional form and meaning cannot be predicted by composing smaller units contained in it, then it seems likely that normal speakers know something in the range of 10,000 to 100,000 words of their native language. The cognitive (and social!) problem of establishing such a large number of essentially arbitrary sound-meaning correspondences, and ensuring that individual children learn them in such a way as to end up able to communicate with one another across as well as within family groups, is a daunting one.
How do we do it?
Well, some might say that we don't. Perhaps human lexical accomplishments have been exaggerated a bit. After all, most talk is on some identifiable topic or another, restricting the likely vocabulary to a few thousand words. Within and across topics, we often use highly stereotyped word sequences, so that the actual average uncertainty about what the next word will be is typically under a thousand, and may often be under a hundred. Even so, we suffer plenty of mishearings and other communication failures.
Although these caveats have some validity, they do not change the earlier estimates of vocabulary size, learning rate and accuracy in word transmission under good conditions. The difficulty and interest of the problem is increased by the fact that a lot more than words goes into the noises someone makes when talking. Speech sounds are modulated in a way that also communicates who you are, how you feel, what kind of impression you want to make, the structure of your message, and the process by which you compose it. Furthermore, what your addressee hears will be considerably changed by where you are talking (in a shower stall, an office, a cathedral), by noises around you (running water, other voices, traffic, music), and perhaps by various unnatural means of transmission such as a telephone or public address system.
Somehow, listeners usually manage to hear the words that were spoken, despite (or more precisely, in addition to) all these other things, both the extra-lexical modulations added by the speakers, and the distortions imposed by the environment. This ability further emphasizes the non-trivial character of the central question: how do humans efficiently learn, remember and use such a large number of distinct equivalence-classes of vocal noises, which we call words?
The answer is phonology--or more precisely, the linguistic sound structures that phonology studies. The rest of this paper is devoted to explaining what these sound structures are like.
The basic principle of phonology is that the notion ``possible word of language X,'' from the point of view of pronunciation, is defined in terms of structured combinations of a small number of basic meaningless elements. Phonologists have developed a variety of theories about what these elements and structures are. The simplest idea is to suppose that the basic phonological elements are like the letters of an alphabet, each of which represents a type of vocal gesture or sound. Then phonological structures are simply strings of such letters (usually called phonemes), with sequence in the string representing succession in time. In fact, there is a special alphabet, known as the International Phonetic Alphabet (IPA), which aims to provide enough symbols to respresent all of the crucial distinctions in all of the languages of the world (even though the sounds associated with an IPA symbol typically vary somewhat from language to language).
This alphabetic approach is enough to provide a basic description of the phonological system of any given language. It is possible to provide an inventory of phonemic symbols for that language, so that all its words can have the distinctive aspects of their sound defined in terms of strings of those symbols, and it is also possible to provide an account of the articulatory and acoustic meaning of such strings of symbols. Traditionally, the study of the distribution of phonemic symbols is the domain of the field of phonology, while the physical interpretation of those symbols is the domain of the field of phonetics.
This simple theory is enough to clarify how humans can learn, remember and use such a large number of different words. Words are not arbitrarily distinct classes of vocal noises. Instead, a word's claims on sound are spelled out in terms of the phonological system of some particular language. This splits the problem, conceptually speaking, into two parts. Learning a given language requires figuring out what its phonological system as a whole is, and how structures of phonological elements are related to vocal noises, independent of any consideration of particular words. Learning the sound of any particular word then requires only figuring out how to spell it phonologically. This phonological spelling is a relatively small amount of information, which might be inferred from a couple of hearings. A phonological spelling nevertheless suffices to predict the wide range of ways in which the word might be spoken by different speakers on different occasions in different contexts, because it is the phonological system as a whole that is anchored to sound. Therefore the learner's knowledge of the physical grounding of a phonological system is based on every bit of experience he or she has ever had in talking or listening in the language in question, and all of this experience can be brought to bear in processing any particular word.
Careful consideration of the facts of individual languages suggests that it is more insightful--scientifically more interesting--to decompose phonemic letters into smaller pieces (called features), organized into simple phonological structures of which syllables are the most familiar example. On this view, phonemic segments--as represented by the letters of the IPA--are actually just convenient names for pieces of these phonological structures. Hypotheses about this type of phonological organization arise naturally out of efforts to define the sound structure of words in particular languages, and to model the phonetic interpretation of these structures. Such investigations indicate that the theory of phonemic letters, taken literally, is not a very good model for any particular phonological system, and also tends to make the phonology of different languages look much more variable than it really is. When we replace a rigidly alphabetic theory of phonology with one based on structured combinations of phonological features, we get a more insightful analysis of individual languages, and we also find that the same sorts of phonological features and structures arise in the analysis of widely separated and apparently unrelated languages.
The human species has not evolved a fixed set of words. We have not even evolved a fixed set of phonemes or syllables out of which to make words. What we have evolved, it seems, is a set of basic phonological features and structures capable of specifying a wide variety of phonological systems, each of which in turn is capable of specifying the sound patterns of a set of words. This amounts to a sort of parts inventory and tool kit for designing and building the pronunciation systems of languages.
When a group of humans use this tool kit to set up a phonological system, and to define the sound of a set of words in terms of it, they are ordinarily not at all aware of what they are doing, despite the considerable complexity of the problem. Doing phonology, in this practical sense, is something that just comes naturally. Each of us participates especially actively in this process during the first few years of our life, and indeed most of us are unable to achieve fully native abilities in a phonological system that we encounter later in life, no matter how carefully we study and how much we practice.
Of course, the process of creating a phonological system is never really carried out starting from nothing. Instead, an existing system is learned again by each individual, while new words are constantly added, old words die out or change their nature, and the phonological system as a whole is gradually redefined in an evolutionary process. Throughout this process of constant renewal and change, the sound system as a whole remains relatively consistent across the speech community, and retains its coherence as a system for each individual, despite the fact that speech communities throughout human history have not had official bodies such as the Académie Française to act as language police. Phonological systems remain lawful because the cognitive architecture of the human species requires them to do so.