## 2 Syntactic constituenthood

Old version (Fall 2006) - current version here

At first glance, a sentence simply consists of a string of words arranged in a single dimension - that of linear order. However, in Chapter 1, we presented some initial evidence for a second syntactic dimension that is less obvious (though no less real!) than linear order - the dimension of constituent structure. Whether a particular string of words is a constituent isn't always self-evident, and so a number of diagnostic tests for constituenthood have been developed. In this chapter, we review these tests, along with some of the difficulties in applying them. We also discuss in more detail how constituent structure is represented in tree diagrams of the sort introduced in Chapter 1.

#### Substitution

The most basic constituenthood test is the substitution test. The reasoning behind the test is simple. A constituent is any syntactic unit, regardless of length or syntactic category. A single word is the smallest possible constituent belonging to a particular syntactic category. So if a single word can substitute for a string of several words, then that's evidence that the single word and the string are both constituents of the same category.

We mentioned in Chapter 1 that pronouns can substitute for noun phrases. Some further examples are given in (1).

 (1) a. The little boy fed the cat. ---> He fed her. b. Black cats detest green peas. ---> They detest them.

As we already said in Chapter 1, it's important to understand that a particular string of words can be a noun phrase in one syntactic context, but not in another. For instance, the substitution test tells us that the underlined strings are noun phrases in (1), but not in (2).

 (2) a. The little boy from next door fed the cat without a tail. ---> * He from next door fed her without a tail. b. These black cats detest those green peas. ---> * These they detest those them.

Rather, in these sentences, it is the longer underlined strings in (3) that are noun phrases.

 (3) a. The little boy from next door fed the cat without a tail. ---> He fed her. b. These black cats detest those green peas. ---> They detest them.

Pronouns are not the only placeholder elements, or pro-forms. For instance, adverbs such as here or there can substitute for constituents that refer to locations or directions. As in the case of noun phrases, whether a particular string is a constituent depends on its syntactic context.

 (4) a. Put it on the table. ---> Put it there. b. Put it over on the table. ---> Put it over there. c. Put it over on the table. ---> Put it there. (5) a. Put it on the table that's by the door. ---> * Put it there that's by the door. b. Put it over on the table that's by the door. ---> * Put it over there that's by the door. c. Put it over on the table that's by the door. ---> * Put it there that's by the door.

The word so can substitute for adjective phrases (here, the most natural-sounding results are obtained in contexts of comparison). As usual, the same string sometimes is a constituent and sometimes isn't.

 (6) a. I am very happy, ... ... and Linda is so, too. b. I am very fond of Lukas, ... ... and Linda is so, too. c. I am very fond of my nephew, ... * ... and Linda is so of her niece.

Finally, pronouns and sometimes the word so can substitute for subordinate clauses introduced by that, as illustrated in (7).

 (7) a. I { know, suspect } that they're invited. ---> I { know, suspect } it. b. I { imagine, think } that they're invited. ---> I { imagine, think } so.

#### Movement

Substitution by pro-forms is not the only diagnostic for whether a string is a constituent. If it is possible to move a particular string from its ordinary position to another position - typically, the beginning of the sentence - that, too, is evidence that the string is a constituent. In order to make the result of movement completely acceptable, it's sometimes necessary to use a special intonation or to invoke a special discourse context, especially in the case of noun phrases. In what follows, "___" indicates the ordinary position that a constituent has moved from, and appropriate discourse material is enclosed in parentheses.

 (8) a. I fed the cats. ---> The cats, I fed ___. (The dogs, I didn't.) b. I fed the cats with long, fluffy tails. ---> The cats with long, fluffy tails, I fed ___. (The other cats, I didn't.)

Movement of constituents other than noun phrases is illustrated in (9).

 (9) a. Prepositional phrase: The cat strolled across the porch with a confident air. ---> With a confident air, the cat strolled across the porch ___. b. Adjective phrase: Ali Baba returned from his travels wiser than before. ---> Wiser than before, Ali Baba returned from his travels ___. c. Adverb phrase: They arrived at the concert hall more quickly than they had expected. ---> More quickly than they had expected, they arrived at the concert hall ___.

#### Questions and sentence fragments

Another way to tell whether a string is a constituent is to see whether it can function as a sentence fragment in response to a question. The question itself also functions as a diagnostic test, since we can think of question formation as involving the substitution of a question word for a string and the subsequent movement of the question word. (11) illustrates this pair of tests for a variety of constituent types.

 (11) a. Noun phrase: What do you like? The cats. Cats with long, fluffy tails. The cats with long, fluffy tails. b. Prepositional phrase: How did the cat stroll across the porch? With a confident air. c. Where did Ali Baba go? On a long journey. To New York. d. Adjective phrase: How did Ali Baba return? Wiser than before. Fairly jeg-lagged. e. Adverb phrase: How did they do? Not badly. Surprisingly well. Much better than they had expected.

Once again, attempting to question nonconstituents is ungrammatical.

Notice, incidentally, that so substitution for adjective phrases and subordinate clauses has a variant that is reminiscent of questions. In addition to just substituting for the string of interest, as illustrated earlier, so can move to the beginning of the sentence, which then undergoes subject-aux inversion - the same process that turns declarative sentences into yes-no questions. This variant of so substitution is illustrated in (13) and (14).

 (13) a. I am very happy, ... ... and so is Linda. b. I am very fond of Lukas, ... ... and so is Linda. c. I am very fond of my nephew, ... * ... and so is Linda of her niece. (14) I { imagine, think } that they're invited, ... ... and so do they.

#### It cleft focus

The final constituent test that we'll consider is based on a special sentence type known as it clefts. It clefts are derived from ordinary declarative sentences as follows. We can often divide an ordinary sentence into two parts: a part that contains background information that is presupposed and a part that is particularly informative, the focus. In an it cleft, the background information and the focus are indicated unambiguously by the way that they fit into a syntactic frame consisting of it, a form of the copula to be, and the element that. In the examples in (15), the frame is in black, the background information is in blue, and the focus is in red. Notice that one and the same sentence can be divided up into background and focus in more than one way, giving rise to more than one it cleft.

 (15) a. Ordinary cats detest the smell of citrus fruits. ---> It is ordinary cats that detest the smell of citrus fruits. b. Ordinary cats detest the smell of citrus fruits. ---> It is the smell of citrus fruits that ordinary cats detest.

If a string can appear as the focus of an it cleft, then it is a constituent. Some examples for various constituent types are given in (16). It is important to realize that it clefts don't always sound entirely natural out of the blue. Nevertheless, it clefts where the focus is a constituent, as in (16), contrast sharply with the word salad that results from attempting to focus a string that isn't a constituent, as in (17).

#### Mismatches between syntax and prosody

We mentioned earlier that it is not always self-evident whether a particular string of words is a constituent. For instance, in reading a sentence like (18) out loud, we can observe an intonation break between cat and that (indicated by the slash).

 (18) This is the cat / that chased the rat.

Because the intonation break is clearly audible, it is very tempting to equate the sentence's abstract syntactic structure with its relatively concrete prosodic structure. Specifically, because the cat does not belong to the same prosodic unit as the relative clause that chased the rat, it is tempting to treat the cat as a syntactic constituent.

As it turns out, however, there are two pieces of evidence against doing so. First, substituting a pronoun for the string the cat is ungrammatical (in the context of (18), though not in principle).

 (19) a. This is the cat that chased the rat. ---> * This is it that chased the rat. b. This is the cat. ---> ok This is it.

Second, the string cat that chased the rat is shown to be a constituent by the grammaticality of substituting the pro-form one. (One substitution is discussed in more detail in Chapter 5.)

 (20) This is the cat that chased the rat. ---> ok This is the one.

The facts in (19a) and (20) converge to tell us that the word cat first combines with the relative clause, and that it is the resulting constituent that the combines with, rather than with cat on its own. It is worth noting that the syntactic structure just described is congruent with the way that the entire noun phrase the cat that chased the rat is interpreted compositionally from smaller expressions. In a simple semantics, the term cat refers to the set of all cats. Combining cat with the relative clause yields cat that chased the rat, which refers to a subset of all cats - namely, those with the property of having chased the rat. Finally, combining cat that chased the rat with the definite article the refers to some unique individual within the rat-chasing subset of cats (exactly which individual this is depends on the discourse context).

As a first approximation, syntactic structure represents the way that the meaning of an expression is composed. This is already evident from the correspondence between noun phrases and individuals, between adjective phrases and properties, between prepositional phrases and locations, directions, etc., between verb phrases and events, states, etc., and so on. However, having said this, it is important to realize that there can be considerable mismatches between syntactic structure and structures at other levels of linguistic representation (prosody, morphology, semantics, and others).

#### Phrasal versus lexical constituents

A second complication in connection with applying syntactic constituenthood tests is that single words don't necessarily function on a par with multiword constituents, even though, being indivisible units, they are constituents by definition. In (21), for instance, cats doesn't pass the constituenthood tests reviewed in the last section, whereas in (22), it does.

 (21) a. The cats are hungry. ---> * The they are hungry. b. Tabby cats are quite common. ---> * Tabby they are quite common. c. Cats without tails are relatively rare. ---> * They without tails are relatively rare. d. Those cats that have no tails are Manx cats. ---> * Those they that have no tails are Manx cats. (22) Cats are not social animals. ---> They are not social animals.

The reason for the grammaticality contrast in (21) and (22) is that there is a systematic difference between the syntactic contexts in these examples. In (21), cats is accompanied either by a determiner or a modifier of some sort, indicated by italics. In such contexts, cats combines with these other words to form a noun phrase, but it isn't a noun phrase in its own right. In (22), on the other hand, cats is a bare (= unmodified) noun and functions as a simple noun and as a noun phrase at the same time. In other words, there are two levels of constituenthood: the lexical level, where single words are constituents by definition, and the phrasal level, where single words don't necessarily behave on a par with multiword constituents.

The constituenthood tests reviewed in the last section turn out to be diagnostic for constituenthood at the phrasal level only. (23) illustrates the ungrammatical sentences that result from attempting to move, question, and focus lexical rather than phrasal constituents. Italics indicate any words that belong to the same phrasal constituent as the underlined item.

 (23) a. Attempt to move: I fed the cats. ---> * Cats, I fed the ___. b. The cat strolled across the porch with a confident air. ---> * With, the cat strolled across the porch ___ a confident air. c. Ali Baba returned from his travels wiser than before. ---> * Wiser, Ali Baba returned from his travels ___ than before. d. They arrived at the concert hall more quickly than they had expected. ---> * Quickly, they arrived at the concert hall more ___ than they had expected. (24) a. Attempt to question: * What did you feed the ___? ---> * Cats. b. * How did the cat stroll across the porch ___ a confident air? ---> * With. c. * How did Ali Baba return from his travels ___ than before? ---> * Wiser. d. * How did they arrive at the concert hall more ___ than they had expected? ---> * Quickly. (25) a. Attempt to focus: Ordinary cats detest the smell of citrus fruits. ---> * It is smell that ordinary cats detest the ___ of citrus fruits. b. The cat strolled across the porch with a confident air. ---> * It was with that the cat strolled across the porch ___ a confident air. c. Ali Baba returned from his travels wiser than before. ---> * It was wiser that Ali Baba returned from his travels ___ than before. d. They arrived at the concert hall more quickly than they had expected. ---> * It was quickly that they arrived at the concert hall more ___ than they had expected.

(26), on the other hand, illustrates the grammatical results of moving, questioning, and focusing phrasal constituents that happen to consist of a single word. (Notice the absence of italicized material in this case.)

#### Verb phrases

There is one category of constituent that we haven't discussed so far - verb phrases. Testing for the constituenthood of verb phrases differs from the case of other syntactic categories in two respects.

First, the pro-forms for verb phrases aren't simple vocabulary items, but are complex: do so for substitution and do what for questions. (Notice that it's only what, rather than the entire pro-form do what, that moves to the beginning of a question.)2

 (29) a. Substitution: She will write a book. ---> ok She will do so. b. The two boys could order tuna salad sandwiches. ---> ok The two boys could do so. (30) a. Question/sentence fragment: What will she do? ---> ok Write a book. b. What could the two boys do? ---> ok Order tuna salad sandwiches.

Second and more importantly, verbs and the verb phrases that contain them come in two varieties, finite and nonfinite (see Verb forms and finiteness in English for discussion). Now, two of the constituenthood tests - substitution and the question/sentence fragment test - yield grammatical results regardless of a verb phrase's finiteness, as shown in (31) and (32).

 (31) a. Substitution, nonfinite verb phrase: She will write a book. ---> ok She will do so. b. finite verb phrase: She wrote a book. ---> ok She did so. (32) a. Question/sentence fragment, nonfinite verb phrase: What will she do? ---> ok Write a book. b. finite verb phrase: What did she do? ---> ok Wrote a book.

The results from the other two tests are more complex. Movement of nonfinite verb phrases is grammatical,3 but movement of finite ones is not.

 (33) a. Movement, nonfinite verb phrase: (She said that) she will write a book, ---> ok (and) write a book, she will ___. b. though she may write a book ---> ok write a book though she may ___ (34) a. finite verb phrase: (She said that) she wrote a book, ---> * (and) wrote a book, she ___. b. though she wrote a book ---> * wrote a book though she ___

In it clefts, nonfinite verb phrases are marginally acceptable in focus, whereas finite verb phrases are again clearly ruled out.

 (35) a. It cleft focus, nonfinite verb phrase: She will write a book. ---> ? It is write a book that she will ___. b. finite verb phrase: She wrote a book. ---> * It is wrote a book that she ___.

To summarize: we have good evidence that nonfinite verb phrases are constituents. In the case of finite verb phrases, we have evidence for constituenthood from two of the four constituenthood tests. Given this slightly complex state of affairs, we will proceed as follows. We will make the simplifying assumption that the ungrammaticality of moving or focusing finite verb phrases has nothing to do with their constituenthood, but that it is due to some other reason, yet to be determined. Having made this assumption, we are then free to treat finite verb phrases as constituents on a par with their nonfinite counterparts even though the syntactic behavior of the two types of verb phrases is not identical in all respects.

Chances are that you are a bit leery of the simplifying assumption just described. If so, think about it as comparable to taking out a loan. True, taking out a loan is risky, and taking out loans in an uncontrolled or irresponsible way can lead to financial disaster. Nevertheless, the credit market is a necessary and productive part of any modern economy. In a similar way, making simplifying assumptions in science can help us to make progress where otherwise we would be stumped by the complexity of the phenomena that we are investigating. Of course, we have to be careful about what simplifying assumptions we make. Otherwise, we end up fooling ourselves into believing that we are making progress, when in fact we are working on such a distorted model of reality that our work is worthless.

Apart from this wrinkle concerning finiteness, verb phrases behave just as we have come to expect from other constituent types. The tests yield grammatical results only for complete verb phrases, not for substrings of them.

 (36) a. Substitution: She will write a book. ---> * She will do so a book. b. Movement: (She said that) she will write a book, ... ---> * ... and write, she will ___ a book. c. though she may write a book ---> * write though she may ___ a book d. Question/sentence fragment: * What will she do a book? ---> * Write. e. It cleft focus: She will write a book. ---> * It is write that she will ___ a book.

And once again, particular strings can be phrasal constituents in certain syntactic contexts, but not in others. For instance, although write isn't a verb phrase when it combines with a direct object, it is a verb phrase on its own, as is evident from comparing the examples in (37) to their counterparts in (36).

 (37) a. Substitution: She will write. ---> ok She will do so. b. Movement: (She said that) she will write, ... ---> ok ... and write, she will ___. c. though she may write ---> ok write though she may ___ d. Question/sentence fragment: * What will she do? ---> ok Write. e. It cleft focus: She will write. ---> ? It is write that she will ___.

#### Representing constituenthood

In Chapter 1, we introduced tree diagrams as a convenient way of representing constituent structure. For a mathematician working in the field of graph theory, the formal properties of tree diagrams are interesting in their own right, but for a syntactician, the interest of trees lies in the fact that they are representations, or models, of constituent structure. In other words, the graphic structure of a tree on the page is intended as a statement about the way that a speaker groups together syntactic elements in his or her mind. In any good model, we want the properties of the model to correspond straightforwardly to the properties of the domain of inquiry. Such a close correspondence allows us to state observations and generalizations about the domain of inquiry without undue complication. Moreover, if we're lucky, we might even be able to use our understanding of the model's formal properties as a sort of conceptual lever to generate hypotheses and to discover generalizations concerning the domain of inquiry that would otherwise go unnoticed.

In light of these considerations, let's consider the sentence in (38), focusing particularly on the constituenthood of the underlined string.

 (38) The secretary drafted the letter.

According to the two tests that apply to finite verb phrases, the string drafted the letter is a constituent.

 (39) a. Substitution: The secretary drafted the letter. ---> The secretary did so. b. Question/sentence fragment: What did the secretary do? ---> Drafted the letter.

Having established this fact, let's now consider two alternative representations of the sentence. We've already encountered (40a) in Chapter 1. (40b) is an alternative, 'flatter' tree.

 (40) a. b.

At first glance, the flatter tree might seem preferable on the grounds that it is simpler in the sense of containing fewer nodes. But let's focus on the question of which tree is a better representation of the sentence. Another way of putting this question is to ask whether either of the trees in (40) has some graphic property that corresponds to the results of the constituenthood tests in (39). In (40a), the answer is 'yes,' since there is a single node (the one labeled VerbPhr) that exhaustively dominates the string drafted the letter (see the section on exhaustive dominance in Node relations for a definition). The tree in (40b), on the other hand, lacks such a node and has no other graphic property that corresponds to the string's constituenthood. Clearly, then, (40a) is a better representation of the sentence, because it follows the natural convention in (41).

 (41) Syntactic constituents are represented graphically as nodes in a tree.

We will conclude this discussion of the model character of syntactic representations by emphasizing that models are just that - models, and not the actual domain of inquiry itself. The purpose of any model is to help us understand some part of reality that is too complex to understand in all of its detail, at least all at once. This means that models are partial in two respects. First, models often leave out many properties of a phenomenon that aren't relevant from a particular point of view. This fact is often stated in the form of the maxim "Don't mistake the map for the territory." For instance, a mountaineer's map might show topographical information in great detail, but completely ignore political boundaries, whereas a diplomat's map might do the reverse. Analogously, in linguistics, syntactic models leave out many important properties of language, such as real-world plausibility, pragmatic felicity, the location of intonation breaks, and so on. These are the focus of other subdisciplines of linguistics.

A second way that models are partial is that they are subject to revision as our understanding of a particular domain improves and deepens. This is simply another way of saying that scientific progress is possible. Although we will not recapitulate all of the revisions that have been made in syntactic theory, we will encounter some of them in the further course of the book.

#### Notes

1. As we mentioned in Chapter 1, the grammaticality of a sentence depends on its interpretation. Specifically, (i) (= (10a)) is ungrammatical under the ordinary interpretation where the prepositional phrase with long, fluffy tails modifies cats.

 (i) The cats, I fed with long, fluffy tails.

But (i) also has an outlandish interpretation that can be paraphrased as I fed long, fluffy tails to the cats. Under this interpretation, (i) is grammatical. In other words, in the pre-movement version of (i) given in (ii), the string the cats is a constituent in the outlandish interpretation, though not in the ordinary one.

 (ii) I fed the cats with long, fluffy tails.

Conversely, the string the cats with long, fluffy tails is a constituent in the ordinary interpretation of (ii), but not in the outlandish one. This is evident from the fact that (iii) has only the ordinary interpretation.

 (iii) The cats with long, fluffy tails, I fed. (The other cats, I didn't.)

2. For completeness, we should mention that do so substitution and the question test for verb phrases are subject to a semantic restriction. Specifically, do so and do what cannot substitute for verb phrases with so-called stative verbs like know or want.

 (i) a. They know her parents; they want the cookies. ---> * They do so. b. What do they do? ---> * Know her parents; want the cookies.

As their name implies, stative verbs refer to states (rather than to activities or accomplishments), and a reasonably reliable diagnostic for them is their inability to appear in the progressive construction.

 (ii) a. Stative verb * They are knowing her parents; they are wanting the cookies. b. Nonstative verb ok They are meeting her parents; they are eating the cookies.

Since do is the prototypical activity verb, it is not surprising that expressions containing it, like do so and do what, cause a semantic mismatch when they substitute for verb phrases containing stative verbs.

3. Movement of nonfinite verb phrases in out-of-the-blue contexts, as in (i), is not very felicitous.

 (i) Write a book, she will ___.

But it is clearly grammatical given appropriate discourse contexts or in certain syntactic constructions, as the examples in the text show.

#### Exercise 2.1

 (1) a. I put the car in the garage. b. I put the car in the garage. c. I put the car in the garage. (2) a. I know the guy with the fedora. b. I know the guy with the fedora. (3) a. They threw in the towel. b. They threw the towel in the closet.

#### Exercise 2.2

How well does each of the trees in (1) and (2) represent the syntactic structure of the sentence it is intended to represent? Your discussion should be concise, but detailed enough to answer the following questions:

• Are any strings represented as constituents that shouldn't be?
• Are any strings not represented as constituents that should be?
• Are any of the trees misleading in other respects?

State the linguistic evidence on which your conclusions are based. (If you have done Exercise 2.1, you can simply refer to the evidence there rather than restating it.)

 (1) a. b. c. (2) a. b. c.

#### Exercise 2.3

In addition to it clefts, English has wh- clefts, so called because they are introduced by question words, almost all of which begin with wh- in English (the exception is how, which counts as an honorary wh- word). Wh- clefts begin with an indirect question, and end with the focus of the cleft, which, just as with it clefts, is a (phrasal) constituent. The two parts of a wh- cleft are connected by a form of the copula. In the examples in (1), the indirect question is in blue, the copula is in black, and the focus is in red.

 (1) a. Noun phrase: What she ate was an apple. b. Prepositional phrase: Where we'll meet is in Houston Hall. c. Adjective phrase: What they are is surprisingly arrogant.

Is it possible to use the wh- cleft construction as evidence that verb phrases are constituents? Do finite and nonfinite verb phrases differ with respect to wh- clefting? Explain, giving examples.

#### Exercise 2.4

Is the do of do so substitution the main verb or the homonymous auxiliary? Answer with reference to the three properties in (41) in Modals and auxiliary verbs in English.

#### Problem 2.1

The substitution test introduced in Chapter 1 and discussed in further detail in this chapter and the substitution operation introduced in Chapter 1 are not identical, but they are related. In a brief paragraph, explain how.

#### Problem 2.2

Is (1) lexically or structurally ambiguous? Explain, giving the results of any constituenthood tests that you use and discussing their limitations, if any.

 (1) They decided on the boat.