Linguistics 001     Lecture 22     Reading and Writing

What is writing?

Writing is not language, but merely a way of recording language by visible marks.

      -Leonard Bloomfield, Language (1933)

Some version of this is clearly true, as we can see by looking at the history of the human species and of each human individual. Writing is an "optional feature" of human culture, consciously invented and developed in a few relatively recent societies, whereas every historically documented human group has always had a spoken language. Spoken language is learned in the cradle by all normal human children, without any apparent organized instruction, whereas writing is learned later in life, only by some, and through explicit instruction. There are good reasons to think that the human species has evolved to make spoken language more natural and more effective; there is no reason to think that biological evolution has been affected by the existence of writing.

Another way to express Bloomfield's point is to say that writing is parasitic on speech, expressing some but not all of the things that speech expresses. Specifically, writing systems convey the sequence of known morphemes in a real or hypothetical utterance, and indicate (usually somewhat less well) the pronunciation of morphemes not already known to the reader. Aspects of speech that writing leaves out include emphasis, intonation, tone of voice, accent or dialect, and individual characteristics.

Some caveats are in order. In the first place, writing is usually not used for "recording language" in the sense of transcribing speech. Writing may substitute for speech, as in a letter, or may deploy the expressive resources of spoken language in visual structures (such as tables) that can't easily be replicated in spoken form at all.

In the second place, writing systems may include some conventions that are substantially autonomous of speech. For example, Geoff Nunberg has argued that punctuation in English is not only or even primarily a representation of the phrasing and intonation of spoken English, but rather an autonomous system for indicating certain kinds of textual relationships. Words that are pronounced the same way ("homophones") may not be written the same way ("homographs"), and vice versa.

Still, Bloomfield was basically correct: writing is a way of using "visible marks" to point to pieces of real or hypothetical spoken language.

Types of writing

There are no pure systems of writing, just as there are no pure races in anthropology and no pure languages in linguistics.

-I.J. Gelb, A Study of Writing (1963)

We asserted above that writing systems convey the sequence of known morphemes in a real or hypothetical utterance, and indicate (usually somewhat less well) the pronunciation of morphemes not already known to the reader. This is true of all conventional orthographic systems, present and past, but these systems accomplish more or less the same thing in various rather different ways.

In discussions of writing systems, you will sometimes see typological terminology like the following, with particular writing systems given as examples of each type:

Type of writing Meaning
Pictographic Elements are pictures, combined in graphically-interpretable patterns (e.g. temporal sequence or spatial relationship)
Ideographic Elements denote ideas, combined in a logical fashion
Logographic Elements denote words or morphemes, combined morphosyntactically
Syllabic Elements denote syllables, combined phonologically
Moraic Like syllabic elements, but units are a bit smaller (see discussion of Japanese kana below)
Alphabetic Elements denote phonemes (more or less), combined phonologically
Featural Elements denote distinctive features of phonemes (such as voicing or place of articulation), combined phonologically

This typology seem very rational, but in fact it is misleading, as rational taxonomies often are. All documented writing systems are a mixture of two or (usually) more of the these categories, and all include a significant phonological aspect. This critique has been most fully develop by John DeFrancis in his book Visible Speech, from which most of the examples below have been taken.

Given the definitions of writing we've given so far, pictographic and ideographic systems would not be included, since they are not ways of "recording language," but rather ways of directly picturing things, events and their relationships. Interestingly, as a matter of empirical fact, it seems that pictographic and ideographic systems have never really developed fully as such. This is not to say that people have never conveyed information with pictures, nor that sets of conventional icons standing for language-independent ideas have never been developed and used. In fact, pictographic and ideographic signs played a central role in the (various) inventions of writing.

However, pictographic or ideographic systems as such have never developed into a form fully capable of conveying unlimited messages from one person to another. Instead, they either remain as limited systems operating within a highly restricted application -- say to keep warehouse records -- or else they develop into a genuine writing system, capable of conveying any linguistic message. In the second case, the process of development into a genuine writing system always involves adding some phonological aspects, in ways we'll describe shortly.

Origins of writing

When they appear in the archeological record about 5,500 years ago, the Sumerians had developed a system of icons inscribed on clay tablets for keeping temple records. A typical example includes icons for "two", "sheep", "temple/house", and the gods "An" and "Inanna". The meaning might be "two sheep received from the temple of An and Inanna", or "two sheep delivered to the temple of An and Inanna", or perhaps something else entirely.

The table whose picture is shown here shows a more sophisticated use of a numbering system, as well as a way of specifying the accounting period, but the basic principles are similar.

These marks constituted a limited notation system, which in the beginning may only have served to remind the writer of what he had once already known. However, as long as agreed-on standards were obeyed, another person could also read the record in the same way. In this, these were similar to systems for record-keeping, based on symbolic tokens of many sorts, developed over and over again in many cultures over the millennia -- marks on stone or bone, clay figurines, even knots in cords. As civilizations become more complex, record-keeping of this kind becomes increasingly important in order to keep commercial transactions straight. The ability of trained third parties to read such records in a consistent way became increasingly important as systems for mediating or adjudicating disputes in non-violent ways come into use. However, most such systems remained limited in their expressive capacity.

In the case of the Sumerian record-keeping system, two crucial innovations led (over a few hundred years) to a full writing system, capable of expressing anything that could be expressed in the (written) words of the Sumerian language.

The first innovation was the Rebus Principle: if you can't make a picture of something, use a picture of something with the same sound. The first clear example of this is in a tablet from Jemdet Nasr, dated to around 2900 BC, in which a pictograph of a reed (GI in Sumerian) is used to mean "reimburse" (also pronounced GI).

The second innovation was what we might call the Charades Principle: if you combine an ambigous or vague picture of the meaning of a word, with a little information about what the word sounds like, you can get a more effective communication of the identity of the word than if you tried to use only imperfect information about meaning, or imperfect information about sound. To give an example from Sumerian, a particular symbol having a meaning something like "leg" might be combined with a symbol pronounced "ba" to give the word GUB "to stand"; the same "leg" symbol, combined with a symbol pronounced "na", gave the word GIN "to go"; and combined with a symbol pronounced "ma", it gave the word TUM "to bring." Thus a Sumerian reader was in effect being asked to play a sort of game of charades: what word has something to do with "leg" and ends in the initial sound of "ba"? -- why of course, that's GUB, "to stand", what else! These combinations became conventionalized, resulting in a system that was presumably somewhat easier to learn to read than to learn to write, but was not very efficient in either direction.

Still, the result was a complete writing system, in which the Sumerians wrote down not just warehouse records, but poems, diplomatic treaties, letters, contracts and judicial decisions, dictionaries, and epic myths.

We can see a modern version of a similar system in Chinese characters. Most characters can be analyzed as containing two elements, one of which provides semantic information, while the other provides phonological information. The following small table (from DeFrancis) illustrates this with a set of four semantic elements crossed with a set of four phonological (or as DeFrancis calls it "phonetic") elements. The numbering of the semantic elements is taken from a standard set of 214 that have been recognized at least since the Kang Xi dictionary of the 18th century, while the numbering of the phonetic elements is taken from a list of 895 compiled by Soothill.

It is clearly inappropriate to call the Chinese system "ideographic", as is sometimes done. Chinese characters refer to morphemes, not ideas. However, to the extent that the pattern in the table above is taken as typical (and DeFrancis claims that about 75% of all Chinese characters work like these examples), Chinese characters are simultaneously a kind of syllabic writing. DeFrancis suggests the term "morpho-syllabic" to describe it.

It can be argued that the degree of phonological information found in the Chinese writing system is not radically different from what is found in English. English spelling usually tells us what the morphemes are, but unless we know in advance, it gives us only imperfect information about pronunciation. We can be sure that "tough" will not be pronounced "congressional" or "halter", but only knowledge of the word itself tells us that it rhymes with "rough" and not with "dough" or "through" or "plough".

Egyptian hieroglyphics also combined pictographic and phonological aspects, often in complicated ways, as the example below suggests. This is the word hememu "humanity". It starts with four symbols denoting the four consonants in the word (the symbol glossed with /u/ is actually /w/). It ends with three semantic determinatives: a seated man, a seated woman, and a set of three lines indicating that multiple entities are referenced.

This approach to writing produced a small number of symbols with simple phonetic values -- Egyptian had 24 simple consonant symbols, shown below -- and led naturally to the development of alphabetic writing systems.

Why pictographic/ideographic writing is not practical

No one has ever developed a full communications system based on pictographic or ideographic principles, although people have often surmised that this would be useful, because it would (or at least could) be universal. The problem is that universality means only that it is equally hard for everyone to develop and learn such a system. If it is feasible to design such a system at all, it is at least very, very difficult. Since everyone already knows at least one ordinary spoken language, practical people will always tend to give up on the ideographic system and start using a written form of their speech, as soon as they can figure out how to do this.

For an amusing myth about this process, check out the story of How the first letter was written, from Rudyard Kipling's Just So Stories.

Why phonological writing is (eventually) practical

It is rather difficult to get enough conscious access to the phonological structure of speech to design an alphabetic writing system, and very few languages have small enough inventories of syllables for a syllabic system to be an easy place to start. More important, the idea of constructing a full writing system (on any basis, phonological or otherwise) is not at all an obvious one.

So writing seems to have started with pictograms for mnemonic aids in record keeping, or as vehicles of insight in divination. As the inventory of signs increases, the possibility arises to begin using some of the signs as rebuses or as phonological/semantic combinations. This is much more efficient than trying to design a new symbol for every word or morpheme. Once this meaning-plus-sound process begins, it can develop into a full (if complex and inefficient) writing system, able to encode any passage in the language. This development seems to have occurred independently at least three times: in the middle east; in China; and in Mexico.

Various other developments are then logically possible. The Chinese (and other cultures influenced by them, including Japan) developed a meaning-plus-sound system based on the syllabic unit. The Mayans did the same. A logical next step is to increase efficiency by doing away with some or all of the meaning-related units, in favor of a consistent syllabary of some sort. Such syllabaries were developed throughout the far east, but in most cases they did not displace the mean-plus-sound elements. Instead they supplemented them for certain uses (such as the encoding of grammatical particles in Japanese) or for certain populations (such as women in some places and periods in China).

By contrast, the Egyptians (and other semitic languages) developed a meaning-plus-sound system based on primarily or solely on consonants. This naturally led to purely consonantal writing systems for some of the semitic languages (such as Phonecian), which shared with Egyptian the property of changing vowels extensively for morphological purposes. To give an example from Hebrew, the root /ktb/ can have among many other forms katav "I wrote", kotav "I write", katoov "written", kitav "letters", katban "scribe". In a language that works this way, it's natural to factor words into consonants and vowels, and to start with a sort of acronym-like use of pictographs to denote their initital consonant, in a meaning-plus-sound system based on consonants only. In the case of most if not all Semitic languages, it has turned out to be usually possible to figure out the vowels from context, even without adding semantic determinatives. This made it possible to abandon the semantic determinatives without giving up general writing.

Alphabetic systems seem to be rather unnatural, and have arguably been developed only once, by the Greeks when they adapted the Semitic consonant-only system to their language, which couldn't so easily be written without vowels. It is possible that this invention would not have happened at all without this particular historical sequence.

Part of the reason for the success of meaning-plus-sound systems is that two kinds of evidence are always better than one: two fairly lousy systems can be combined into one decent one. But there is another reason as well. Sound systems are made up of quite limited materials. In many lanugages, the number of distinct syllables is not terribly great; and the number is much reduced if natural equivalence classes (such as "starts with" or "rhymes with") are used. Once one gets started down the sound-system road, it is tempting to go all the way, since it makes the practical training of scribes more efficient only to have to learn a few dozen symbols, rather than several thousand. Of course, the Scribe's Guild may think it's just fine to limit the number of literate people, and to leave large barriers in place, blocking entry to skill in their profession.

Why is reading hard to learn?

For the same reasons that writing was hard to invent, reading is hard to learn. Neither reading nor writing is a biologically natural process. Alphabetic writing systems are in principle the most efficient, since they require learning the smallest number of symbols. No one would design a writing system today on any other basis. However, alphabetic systems seem to impose a special burden on learners, because they require understanding a level of analyses -- phonemic analysis -- that is relatively inaccessible to introspective scrutiny. The orthographic system of English also has many morpheme-related idiosyncrasies, which eventually make it easier to recognize words (just as the Chinese morphosyllabic system does), but which also may obscure the alphabetic principle for early learners.

A number of ongoing long-term longitudinal studies show that about 60% of American children find it difficult to learn to read, and that 20-30% fall seriously behind or fail entirely. The reasons for these problems, and the best ways to deal with them, are a matter of great controversy. A great deal depends on the answers.

The opening salvo in one of the this war's battles was fired more than 50 years ago by Rudolf Flesch in his 1955 book Why Johnny Can't Read.

In an attempt to present an authoritative consensus on this important topic, I'll start with quotes from congressional testimony given in 1998 and 1999 by Dr. G. Reid Lyon, Chief of the Child Development and Behavior Branch of the National Institute of Child Health and Human Development, which in turn is part of the National Institututes of Health (NIH).

First, the diagnosis:

[C]hildren who have difficulties learning to read can be readily observed. The signs of such difficulty are a labored approach to decoding or "sounding" unknown or unfamiliar words and repeated misidentification of known words. Reading is hesitant and characterized by frequent starts and stops and multiple mispronunciations. If asked about the meaning of what has been read, the child frequently has little to say. Not because he or she is not smart enough; in fact, many youngsters who have difficulty learning to read are bright and motivated to learn to read, at least initially. Their poor comprehension occurs because they take far too long to read the words, leaving little energy for remembering and understanding what they have read. Unfortunately, there is no way to bypass this decoding and word recognition stage of reading. Using context to figure out the pronunciation of unknown words cannot appreciably offset a deficiency in these skills. In essence, while one learns to read for the fundamental purpose of deriving meaning from print, the key to comprehension starts with the immediate and accurate reading of words. In fact, difficulties in decoding and word recognition are at the core of most reading difficulties... To be sure, there are some children who can read words accurately and quickly yet do have difficulties comprehending, but they constitute a small portion of those with reading problems.

If the ability to gain meaning from print is dependent upon fast, accurate, and automatic decoding and word recognition, what factors hinder the acquisition of these basic reading skills? As mentioned above, young children who have a limited exposure to both oral language and print before they enter school are at-risk for reading failure. However, many children with robust oral language experience, average to above intelligence and frequent interactions with books since infancy may also show surprising difficulties learning to read. Why?

In contrast to good readers who understand that segmented units of speech can be linked to letters and letter patterns, poor readers have substantial difficulty in developing this "alphabetic principle." The culprit appears to be a deficit in phoneme awareness--the understanding that words are made up of sound segments called phonemes. Difficulties in developing phoneme awareness can have genetic and neurobiological origins or can be attributable to a lack of exposure to language patterns and usage during the preschool years. The end result is the same, however. Children who lack phoneme awareness have difficulties linking speech sounds to letters, leading to limitations in the development of decoding and word recognition skills, resulting in extremely slow reading. As mentioned, this inaccurate and labored access to print renders comprehension very difficult.

What about the cure?

First, experience before school age matters:

It is clear from research on emerging literacy that learning to read is a relatively lengthy process that begins very early in development and clearly before children enter formal schooling. Children who receive stimulating literacy experiences from birth onward appear to have an edge when it comes to vocabulary development, an understanding of the goals of reading, and an awareness of print and literacy concepts. Children who are read to frequently at very young ages become exposed in interesting and exciting ways to the sounds of our language, to the concept of rhyming, and to other word and language play that serves to provide the foundation for the development of phoneme awareness. As children are exposed to literacy activities at young ages, they begin to recognize and discriminate letters. Without a doubt, children who have learned to recognize and print most letters as preschoolers will have less to learn upon school entry. The learning of letter names is also important because the names of many letters contained the sounds they most often represent, thus orienting youngsters early to the alphabetic principle or how letters and sounds connect. Ultimately, children's ability to understand what they are reading is inextricably linked to their background knowledge. Very young children who are provided opportunities to learn, think, and talk about new areas of knowledge will gain much from the reading process. With understanding comes the clear desire to read more and to read frequently, ensuring that reading practice takes place.

Second, early diagnosis of reading problems in school-age children is possible, and it is important because the right kind of early intervention can make a big difference:

In studying approximately 34,501 thousand children over the past 33 years, we have learned the following with respect to the role that phonemic awareness plays in the development of phonics skills and fluent and automatic word reading:

  1. Phonemic awareness skills assessed in kindergarten and first grade serve as potent predictors of difficulties learning to read. We have learned how to measure phonemic awareness skills as early as the first semester in kindergarten with tasks that take only 15 minutes to administer - and over the past decade we have refined these tasks so that we can predict with approximately 80% to 90% accuracy who become good readers and who will have difficulties learning to read.
  2. We have learned that the development of phonemic awareness is a necessary but not sufficient condition for learning to read. A child must integrate phonemic skills into the learning of phonics principles, must practice reading so that word recognition becomes rapid and accurate, and must learn how to actively use comprehension strategies to enhance meaning.
  3. We have begun to understand how genetics are involved in learning to read, and this knowledge may ultimately contribute to our prevention efforts through the assessment of family reading histories.
  4. We are entering very exciting frontiers in understanding how early brain development can provide a window on how reading develops. Likewise, we are conducting studies to help us understand how specific teaching methods change reading behavior and how the brain changes as reading develops.
  5. We have learned that just as many girls as boys have difficulties learning to read. Until five years ago, the conventional wisdom was that many more boys than girls had such difficulties. Now females should have equal access to screening and intervention programs.
  6. We have learned that for 90% to 95% of poor readers, prevention and early intervention programs that combine instruction in phoneme awareness, phonics, fluency development, and reading comprehension strategies, provided by well trained teachers, can increase reading skills to average reading levels. However, we have also learned that if we delay intervention until nine-years-of-age, (the time that most children with reading difficulties receive services), approximately 75% of the children will continue to have difficulties learning to read throughout high school. To be clear, while older children and adults can be taught to read, the time and expense of doing so is enormous.

For a more recent summary of the same issues, see Mark Seidenberg, "The Science of Reading and its Educational Implications", 2013:

Research in cognitive science and neuroscience has made enormous progress toward understanding skilled reading, the acquisition of reading skill, the brain bases of reading, the causes of developmental reading impairments and how such impairments can be treated. My question is: if the science is so good, why do so many people read so poorly? I mainly focus on the United States, which fares poorly on cross-national comparisons of literacy, with about 25-30% of the population exhibiting literacy skills that are low by standard metrics. I consider three possible contributing factors, all of which turn on issues concerning the relationships between written and spoken language. They are: the fact that English has a deep alphabetic orthography; how reading is taught; and the impact of linguistic variability as manifested in the Black-White “achievement gap”. I conclude that there are opportunities to increase literacy levels by making better use of what we have learned about reading and language, but also institutional obstacles and understudied issues for which more evidence is badly needed.

This may sound like a case where common sense is confirmed by the results of scientific study. However, the implications cannot be taken for granted. The "Whole Language" approach, which at times has dominated American educational practice, is still strongly represented among teachers and educational administrators. This approach emphasizes a direct between seeing written words and understanding their meaning, featuring written words as visual patterns, and avoiding any focus on the systematic relationship between letters and sounds. Despite the evident good intentions of its adherents, the Whole Language movement has been a deeply destructive force.

See this 2018 Forbes Magazine article "Why Johnny Still Can't Read -- And What To Do About It", and this review of Mark Seidenberg's 2017 book Language at the Speed of Sight for a recent evaluation of the situation, which seems to be that most American teachers are taught little or nothing about the science of reading and how to teach reading effectively.

The result is that many American children are still not learning to read well. Thus according to "Reading Scores on National Exam Decline in Half the States", NYT 10/30/2019:

Only 35 percent of fourth graders were proficient in reading in 2019, down from 37 percent in 2017; 34 percent of eighth graders were proficient in reading, down from 36 percent.

This debate about how to teach children to read is one of the most important public policy issues in the country today.






    [course home page]    [lecture schedule]     [homework]