Communication:
a Biological Perspective

Where did human language come from?

This is not an easy question to learn to ask. We think of ourselves as being the pinnacle of development, the most perfect of earth's creatures. As Alexander Pope put it in his Essay on Man,

Far as creation's ample range extends,
The scale of sensual, mental pow'rs ascends:
Mark how it mounts to man's imperial race,
From the green myriads in the peopled grass:
...
Remembrance and reflection how ally'd;
What thin partitions sense from thought divide?
And middle natures, how they long to join,
Yet never pass th' insuperable line!
Without this just gradation, could they be
Subjected, these to those, or all to thee?
...
Vast chain of being! which from God began,
Natures aethereal, human, angel, man,
Beast, bird, fish, insect, what no eye can see,
No glass can reach; from infinite to thee,
From thee to nothing.

We tend to view the quirks and peculiarities of our species as The Right Thing to Do, and assume that if other species don't do the same, it's because they just haven't evolved to our level. Surely our most complex and perfected language must be what all other species aspire to "in just gradation," mounting a scale of communicative complexity from worms to insects to fish to birds to mammals to us.

However, one authority on animal communication tells us:

For most relatively social adult fishes, birds and mammals, the range or repertoire size [of communicative displays] for different species varies from 15 to 35 displays.

Curiously, there appears to be little correlation between repertoire size and location in the "vast chain of being." Cuttlefish, as far as we know, have about as many different communicative displays as primates do. Biologists who study the evolution of behavior speculate that there are selective pressures that prevent the overall number of displays from growing beyond a certain point, even though it is clear that new displays are developed to suit new adaptive circumstances. In the same way, although specialized physical organs develop in response to new evolutionary opportunities, other specializations tend to be lost, so that creatures do not over time accumulate indefinitely many humps, horns, ruffs, claws and so on.

Thus human language, with its hundreds of thousands of words, is not just the logical endpoint of some obvious evolutionary "scale of sensual, mental powers." Rather, it seems to be a behavioral counterpart of the peacock's tail or the elephant's tusk: a specific, enormously hypertrophied development of structures with rather different original functions.

How and why did this happen? If complex systems of communication are so great, why hasn't evolution been developing them in other species for the last few hundred million years -- as eyes, ears, horns, claws etc. have repeatedly been developed?

Is language in our genes?

Does it make sense to ask about the genetic evolution of language?

The essence of life is the transmittal of genetic information. Words like "communication" are sometimes used to talk about the expression of genetic information within the cell, and the transmittal of genetic information to new cells. There are good reasons for these verbal analogies -- there are interesting mathematical affinities between computational linguistics and computational biology.

However, we share this genetic language with every other living thing on earth, while it is only our fellow humans that we can talk with. Although molecular genetics is not the kind of "language" we are investigating, it provides one framework for interpreting our question about the origins of human spoken language.We can ask: what aspects of the human genome make spoken language possible? What selective pressures on our ancestors led these characteristics to develop?

It's conceivable that looking for this genetic basis of human language is not very enlightening. For example, we would not learn much by asking for the genetic basis of certain other uniquely human traits, such as the practice of wearing baseball caps backwards. There are things we could say -- humans have heads, for instance, and a tendency to be dazzled by sunlight when looking for things in the air on bright days, whence hats with brims -- but in fact the main issues are cultural, not biological. The human species has not adapted genetically to wearing caps, whether forwards, backwards or sideways. Instead, the design and use of caps has "evolved" as part of the culture of a particular time and place, among people no different genetically from those with very different tastes in headgear.

Human language and culture are deeply interconnected, to the point that it would absurd to study the evolution of language without considering its role in broader social and cultural questions. However, the human species has in fact adapted genetically to facilitate the use of spoken language. Thus it is worthwhile to look into what these adaptations are, and also at some theories about what selective advantage they offered to our ancestors.

What we hominids did

We are talking about evolution during the roughly five million years since we separated from the ancestors of today's great apes (Chimpanzee, Gorilla, etc.).

We are going to sidestep several controversies:

The language-related changes took place from the neck up. These changes took place in two areas: the mouth and throat (the "vocal tract"), and the brain.

Vocal tract changes in hominid evolution

One set of changes occurred between neck and nose, and served to adapt our vocal tracts for speaking.

Specifically, we shortened our muzzle and the oral cavity it contains, and stretched out our pharynx (throat, in ordinary language) by lowering the larynx (what is behind your Adam's apple). The comparison below of human and chimpanzee vocal-tract anatomy shows the changes:

The result of these changes is to make it possible for our tongue to move forward and back, up and down, in a way that creates resonant cavities of different sizes in various places in the vocal tract, as this synthesis demonstration shows.

The picture below shows that the skull of Homo erectus, our immediate ancestor who lived between about 1.8 million and 100,000 years ago, appears to be intermediate in these respects between the great apes and our esteemed selves.

These changes are great for making a wide variety of different vowels and consonants. However, they are otherwise a bad idea!

The expansion of the pharynx creates some real problems. For instance, it means that laughing while drinking tends to propel liquids out the nose. Much more seriously, it's relatively easy for us to get a chunk of food lodged in the larynx, with potentially fatal results. To quote from Holloway 1996, The evolution of the human vocal apparatus:
 

This problem is even worse for men than for women, because as a secondary sexual characteristic of male humans, the larynx increases in size and moves lower in the throat at puberty. None of the other great apes show this laryngeal sexual dimorphism, or indeed any other vocal tract dimorphism -- though they have much greater dimorphism in overall size, and also show dimorphism of canine teeth, which humans entirely lack.

The unique human development of sexual dimorphism in larynx size and position presumably means that vocalization is important to us in ways that it is not to gorillas and chimps.
 

 Brain changes

One thing that happened to our brain was that it just got bigger. This chart (from Holloway 1996, Evolution of the Human Brain) shows that the relationship of brain weight to body weight is roughly linear on a log-log scale across a large range of primate sizes. The data point for humans is obviously above the trend line by a significant factor:

However, the hominid brain did not just get uniformly larger. According to Holloway's discussion:

There are four major reorganizational changes that have occurred during hominid brain evolution, viz.: (1) reduction of the relative volume of primary visual striate cortex area, with a concomitant relative increase in the volume of posterior parietal cortex, which in humans contains Wernicke's area; (2) reorganization of the frontal lobe, mainly involving the third inferior frontal convolution, which in humans contains Broca's area; (3) the development of strong cerebral asymmetries of a torsional pattern consistent with human right-handedness (left-occipital and right-frontal in conjunction); and (4) refinements in cortical organization to a modern human pattern, most probably involving tertiary convolutions. (this last 'reorganiziation' is inferred; in fact, there is no direct palaeoneurological evidence for it.)
Of the four changes cited, the first three straightforwardly involve language in whole or in part. Wernicke's area in modern humans is involved in comprehension of language. Broca's area is involved in motor control of speech. The cerebral asymmetries in the third point involve a localization of language skills in the dominant (generally left) hemisphere of the brain, and of other abilities (visuo-spatial and emotional) in the non-dominant hemisphere.
 
Like the vocal-tract changes, the brain changes have a cost. For one thing, brain tissue is expensive to maintain, about ten times more expensive than other tissue. The human brain, although only about 2% of our body weight, consumes about 20% of our energy.

For another thing, increased brain size normally translates to increased gestation period, because fetal brain tissue is laid down at a relatively constant rate. This graph shows the relationship for a dozen species from mice to elephants:

Humans are on this graph -- as the isolated point down and to the right from the group of points on the left. If the human data point were brought in line with the trend for the rest of the species, it looks like human babies ought to be born about 17 months after conception, rather than 9. However, this would be a bad idea. Anyone who has ever given birth, or witnessed a birth, knows that an 8-month-old baby (17-9=8)  would just not make it out, even if the mother could manage the extra period of pregnancy.

Instead, full-term human infants are in fact born "premature" by the standards of the rest of the animal kingdom. In fact, since development is slowed down after birth as well, human infants are not as mature as new-born chimps until they are a year old or more. Taking care of these "premature" infants imposes considerable burdens on human parents, and especially on the mother, during the first year of life.

Why'd we do it?

This is a key question: what was the source of selective pressure?

Before trying to answer it -- and all answers are speculative ones -- let's look at some background.

A "forest of symbols" -- the cybernetic imperative

Signals are everywhere -- for those who can understand them.

As Charles Baudelaire wrote, nature is a "forest of symbols." Light, sound, air and water currents, drifting chemicals, temperature gradients, all carry information about the structure of the world and the activities of its inhabitants.

Critters that are better at reading the world's signs tend to eat better, live longer, and reproduce more effectively, so there is selective pressure to develop sensors of all types. These may be simple sensitive molecules in the membrane of a single cell, or more complex subcellular assemblies, or elaborate structures of many specialized cells, like eyes and ears.

A pure sensor, connected to nothing, is worthless. Its owner needs to evaluate the information it provides, and to act appropriately. In fact, sensory evaluation is also necessary for effective action. To move towards a goal or around an obstacle, to manipulate an external object, and indeed to do almost anything, an organism needs feedback about the consequences of its actions. "Open loop" action, where no information comes back, only works where the environment is so well known that nothing unexpected can happen. The real world is a complicated and ambiguous place, full of unexpected obstacles, dangers and opportunities, and in this world, perception without action and action without perception are equally useless.

The basic mathematics of this integration of perception and action was worked out during WW II, motivated by the need for radar-guided anti-aircraft guns, auto-landing devices for airplanes, homing torpedoes, and the like. Fundamental work in this area was done by Norbert Wiener at MIT and by Andrey Nicolaevich Kolmogorov at MSU (that's Moscow State, not Michigan State). These two were among the greatest mathematicians of the century. After the war, Wiener went on to develop the underlying metaphor of "control and communication" -- that is, the integration of perception, information and action --  into the field of cybernetics.

Private signals

At the cellular level, the signals that organisms interpret are mainly chemical ones (though there are receptors for light, motion and electromagnetic fields as well). These signals may come from the organism itself, for purposes of internal development and control, or they may come from the outside world.

Larger organisms need cybernetic systems at multicellular scales. Chemical diffusion is not always fast enough for such purposes, so specialized cell types developed to transmit electrical signals, leading to the development of nervous systems.

Organisms benefit from communicating with nearby relatives and other members of the same species. For organisms that reproduce sexually, locating prospective mates is critical. Warning of danger and drawing attention to available food are a way to help nearby relatives, offering selective advantage for social organization.

Chemical signals will work for communication between individuals. For example, tobacco plants infected with certain viruses give off methyl salicylate vapor, which travels through the air to healthy leaves on the same or neighboring plants, causing increased expression of a gene connected to viral resistance. In many animals, sexual receptiveness is signaled chemically, and special organs may develop to deal with such chemical signals. For instance, the Pittsburgh zoo's page on the African elephant tells us that

Making noise: the evolution of vocalization

However, chemical signals have definite limitations. They travel fairly slowly, and don't travel upwind at all.

Neural signals are not an option for communication between individuals -- you have to be "plugged in."

This leaves a few other options:

Nearly all organisms have acoustic sensors, developed to help figure out "what's going on out there." There are lots of ways to make sounds -- tapping or scraping limbs, whistling or grunting with the respiratory apparatus.

The somatic portfolio: how much to invest in what?

Although cybernetic systems are useful, they come at a cost, to organisms as well as to defense contractors (or rather taxpayers). Not every military aircraft carries the electronic-warfare sensors and countermeasure devices of highly specialized planes like the AWACS shown in the picture.

Its manufacturer boasts that "its radar" (the big saucer-shaped device on its back) " has a 360-degree view of an area, and at operating altitudes it can detect targets more than 320 kilometers (200 miles) away. AWACS mission equipment can separate, manage and display these targets individually on situational displays." These are enviable capabilities compared to those of a typical fighter plane. However, to put the same systems on individual fighter aircraft would cost a lot, in money, weight, and aerodynamic compromises. Designers have decided that the cost is not worth the benefit: a fleet of fighters carrying such radars would be defeated by a larger number of cheaper, faster, more maneuverable opponents.

The evolution of biological organisms is subject to similar trade-offs. All sorts of incredible sensory systems, integrated in stunning ways with action capabilities, are possible. The use of sonar by bats in catching insects and by barn owls in catching mice, the odor sensitivity of dogs or pigs, the magnetic navigation of some birds, are all available in principle. However, specialized sense organs and specialized neural circuits are "expensive", in the sense that it takes energy to build them, and they may compromise other functions.

Evolution is constantly carrying out a sort of experimental cost-benefit optimization. Our hominid ancestors, when they split off from the lineage of chimps and gorillas some 5 million years ago, might have gone on to develop built-in sonar or an improved sense of smell. They might also have stayed about the same, as indeed Homo Erectus, the species that is our immediate anscestor, did for almost two million years. Instead, they learned to talk. Why?

OK, so why'd we do it?

What are these physical changes -- in jaw, throat and brain -- good for?

They're definitely good for spoken language.

The redesigned vocal tract is good for making lots of different vocal sounds. The reorganized and expanded  Broca's area deals with control of sound structures -- aspects of what we will call phonology and phonetics when we get to them. The reorganized and expanded Wernicke's area, along with the larger cortex in general, allows us to have lots and lots of words, each one connecting a meaning with a pronunciation.

Somewhere along the line, we learned to think about what others believe -- what philosophers call the "others minds" problem -- and this made us better at communicating regardless of the medium.

But why? Why did our ancestors make such a big investment in talking?

One theory is that they invested in language so as to be able to think better. This hypothesis views rational thought as being at least in large part made up of inner speech. A recent book by Derek Bickerton argues this point of view.

Another theory is that they invested in language so as to be better able to coordinate activities like hunting.

A third theory is that they invested in language in order to make it possible to teach tool-making more effectively.

Each of these theories has some positive aspects -- the cited advantages do to some extent exist. However, one may doubt how strong such effects could have been. For instance, in documented modern hunter-gatherer cultures, language does not play a very large role either in coordinating hunts or in teaching tool-making. Many kinds of human thought do not seem to involve language at all.

A fourth class of theories says that the crucial selective advantages of language were social. That is, something about the development of language made the creation and maintenance of larger social groups possible, at a time when larger social groups were essential to survival. In this connection, for instance, a shibboleth can be viewed as a contribution to group identity formation.

Gossip as (more efficient) grooming

A recent version of this idea, due to Robin Dunbar, is worth considering in a bit of detail.

In outline, Dunbar's argument is as follows:

Among primates, "encephalization" (brain size normalized for body size) varies in proportion to social group size. Apparently, the larger the group a primate lives in, the more brain it needs to keep track of social relationships within the group. This is plausible, given the intricate micro-politics of primate society, as documented by ethologists. If we take the step from correlation to causation, and assume that larger brains evolved in primates in order to permit larger social groups (e.g. for better intra-species competition or better defense against predators), we have what has been called the "Machiavellian Intelligence Hypothesis."

If we look at human brain size from the perspective of this hypothesis, and extrapolate the relationship between brain size and social group size found in other primates, we predict a "natural" group size for humans of about 150.

In primate societies, grooming (picking nits out of fur) is a major factor in establishing and maintaining social bonds. There are interesting hypotheses about why grooming fulfills this function, but for now, we can just note that the bigger the primate group, the more time on average each member spends in grooming others. If we look at human social relations in this perspective, then with a group size of 150, we should have to spend 40% of the day in grooming. This is far too high to be practical -- the highest actual proportion observed among primates is 20% (Gelada baboons).

Dunbar suggests that our ancestors, facing hard times on the African plains, very badly needed to live in larger groups. "Gossiping" (in whatever form it first arose) made it possible to form and maintain social bonds more efficiently than grooming, both because more than two can do it at once, and also because you can actually do some useful work (like gathering or processing food) at the same time. The development of sense and reference -- and especially of proper names for group members -- enabled political maneuvering at a higher level in larger groups.

Here are two of Dunbar's graphs, first showing predicted group size for various hominids:

The same graph, with the Y-axis labelled as "grooming time" -- it's the same graph because both as derived from models of the relationship of group size and grooming time to encephalization.

What were the steps in the process?

On any account of the selective pressures leading to human genetic specialization for spoken language, we still owe an explanation of how the behaviors got started and then developed to their present state.

The gestural origin theory: sign language came first, with speech at first just a secondary accompaniment.

The song theory: song-like vocal displays came first, perhaps with a function in sexual selection. Like music, they involved complex patterns but had no meaning. Certain "motifs" or bits of vocal pattern came to have referential value.

All such theories are completely speculative, and so is Dunbar's theory of gossip as a more efficient substitute for grooming. However, there is fairly general agreement that establishment and maintenance of social cohesion was probably a key source of selective pressure for the evolution of humankind's spectacularly hypertrophied "language organ(s)."

 
 [On to the next lecture] 


[Ling001 home] [Schedule]