How Linguists Think About Sentence Structure

contents of this chapter:

sentence examples
some intuitions about word-groups
rules for replacement
one rule describes many sentences
dominance and precedence
optional additions
complementizer phrases
detour: recursion in natural language syntax
back to Malory
when is a sentence not a sentence?
rules for the rest of the sentence
the examples as they appear in the corpus
postscript: writing a formal grammar of the English language

sentence examples

I have drawn the examples used in this chapter from the corpus version of Malory's Morte d'Arthur. I have modernized the spelling to make the examples easier to understand and added my own notes in square brackets to help explain the language. At the end of the chapter all the examples will be shown in their corpus versions.

some intuitions about word-groups

Here's a sentence from Malory, that is understandable to an English speaker today:

Then the queen made great joy when she knew who was the father of her child.

(The queen is Igrayne; the child is Arthur; she has just discovered that Uther is his father.) Intuitively, any English speaker can divide the words in this sentence into distinct groups. For instance, it's clear that "great" belongs with "joy" (in a way that it does not belong, for instance, with "queen"). Similarly, "who was the father of her child" is a distinct group of words, since it answers the question, "what did the queen know?" Also, there are two clauses that could act as sentences on their own, namely "the queen made great joy" and "she knew who was the father of her child." Linguists call these natural word-groups constituents.

rules for replacement

If we allow this as a sentence,

The queen made great joy.

why does the following sentence seem wrong to us?

Of her child made great joy.

This has to do with the rules for replacement of constituents. We have replaced "the queen", which is a noun phrase, with "of her child", which is a prepositional phrase. The sentence will only sound right to us if we replace the noun phrase, "the queen", with another noun phrase. For instance, the following sentence sounds right, because "the father of her child" is a noun phrase (which contains a prepositional phrase):

The father of her child made great joy.

As another example,

They set a pavilion over the stone and the sword.

is a legitimate sentence, but

They set a pavilion over the stone and strongly made.
sounds wrong to us. This is because in English the conjunction "and" must join two constituents of the same type; we accept "the stone and the sword" because "the stone" and "the sword" are both noun phrases, but we don't accept "the stone and strongly made" because "the stone" is a noun phrase and "strongly made" is an adjective phrase.

This suggests one of the basic premises of syntax, that some patterns of constituents are allowed in a language and some are not.

towards a formal grammar

This notion that some patterns of constituents are allowed and some are not can be expressed in a formal grammar.

In the system of labelling used for the Middle English corpus, the sentence

The queen made great joy.

has the structure NP-SBJ — VBD — NP-OB1 where NP-SBJ means noun phrase subject ("the queen"), VBD means verb in the past tense ("made") and NP-OB1 means noun- phrase direct object ("great joy"). Informally, the subject is whatever does the action of the verb; the object describes what the verb was done to. The long hyphens ("—") represent precedence: that is, the noun phrase precedes the verb, which precedes the object. So one rule of our formal grammar must look like this:

rule 1.) IP —> NP-SBJ — VBD — NP-OB1

"IP" stands for "inflectional phrase", which means, very roughly, "sentence." This rule says, "a sentence can be rewritten as a noun-phrase subject preceding a verb in the past tense preceding a noun-phrase object."

one rule describes many sentences

Once we have a rule (rule 1, above), we can build other sentences from it. For instance, for a noun phrase subject we can use the pronoun "he", for a verb we can use "kissed", and for the verb's object we can use "the lady Igrayne", to compose the sentence:

He kissed the lady Igrayne.

Notice that the two noun-phrase objects in these examples have a different structure: "great joy" has the structure ADJ — N (adjective "great" precedes noun "joy") but "the lady Igrayne" has the structure D — N — NPR (determiner "the" precedes noun "lady" precedes proper noun "Igrayne". A proper noun is a noun referring to exactly one object: there may be many ladies, but only one Igrayne. Proper nouns are usually capitalized in Modern English.) This suggests these rules of our formal grammar:

rule 2.) NP-OB1 —> ADJ — N
rule 3.) NP-OB1 —> D — N — NPR

dominance and precedence

We say that the constituent on the left hand side of the rule "dominates" the constituent on the right hand side of the rule. So "NP-OB1" dominates "ADJ — N" in the sentence "The queen made great joy", and "NP-OB1" dominates "D — N — NPR" in the sentence "He kissed the lady Igrayne." For that matter, "N" dominates "joy" in "great joy", and "N" dominates "lady" in "the lady Igrayne."

Thus a sentence can be entirely described in terms of precedence and dominance; precedence is expressed by the rules of the formal grammar, and dominance is expressed by the trail of rewrites made in building the sentence.

optional additions

Of course, our original sentence was longer than "the queen made great joy." There was an entire prepositional phrase, labelled "PP", "when she knew who was the father of her child", tacked onto the end, as well as an adverbial phrase describing time, labelled "ADVP-TMP", "Then", tacked onto the beginning. So we could rewrite rule 1 like this, using parentheses to show that the prepositional and adverbial phrases are optional (because we would accept "The queen made great joy" as a sentence on its own):

rule 1.1) IP —> (ADVP-TMP) — NP-SBJ — VB — NP-OB1 — (PP)

complementizer phrases

Remember that the prepositional phrase contained a sentence of its own, "she knew who was the father of her child", following the preposition "when". The clause that links the embedded sentence to the main sentence is called a "complementizer phrase". In the schema used for the Middle English corpus, the complementizer phrase in this example is labelled "CP- ADV", because it describes the time that the main clause is happening, which makes it adverbial. The "C" is the complementizer; in this example it is not represented by any word, so "0" is entered.

rule 3.) PP —> P — CP-ADV
rule 4.) CP-ADV —> C — IP

For contrast, here's an example where a complementizer phrase is linked by a non-0 complement:

I am sick for anger and for love of fair Igrayne, that I may not be whole [well].

Here, the embedded sentence "I may not be whole" is introduced by the complementizer "that".

detour: recursion in natural language syntax

This embedding of one sentence in another sentence illustrates a basic feature of natural- language syntax, that of recursion. Since it is possible to embed one sentence in another sentence, there is no limit to possible sentence length and thus an infinite number of sentences can be generated. A classic example of recursion is the children's rhyme "The House that Jack Built", where each sentence embeds the one before it:

This is the house that Jack built.
This is the malt that lay in the house that Jack built.
This is the rat that ate the malt that lay in the house that Jack built.

In theory this game could go on forever but in practice it mercifully ends:

This is the farmer sowing the corn, that kept the cock that crowed in the morn, that waked the priest all shaven and shorn, that married the man all tattered and torn, that kissed the maiden all forlorn, that milked the cow with the crumpled horn, that tossed the dog, that worried the cat, that killed the rat, that ate the malt that lay in the house that Jack built.

back to Malory

The embedded sentence "she knew who was the father of her child", of course, has a structure of its own. In the Middle English corpus it is represented like this:

rule 5.) IP —> NP-SBJ — VBD — CP-QUE

where "she" is the noun-phrase subject, "knew" is the verb in the past tense, and "CP- QUE" is an indirect question. That is, it answers the question, "what did the queen know?" The indirect question "who was the father of her child", in turn, has structure:

rule 6.) CP-QUE —> WNP — C — IP

where WNP means wh-noun-phrase, a noun phrase beginning with "who", "which", or "what", in this case the pronoun "who", and C is the 0 complementizer which links the embedded sentence "was the father of her child", labelled IP.

when is a sentence not a sentence?

It is not obvious why "was the father of her child" is labelled an embedded sentence. Most English speakers would not accept it as a sentence on its own because it appears to have no subject. In this type of analysis, the subject is a "trace", which can be thought of as a placeholder for a constituent that has been moved away. The constituent that has been moved in this case is the "who". It is a formal requirement of English grammar that wh- words move to the WNP position.

rules for the rest of the sentence

This is the rule describing the IP "was the father of her child":

rule 7.) IP —> NP-SBJ — BED — NP-OB1

where NP-SBJ represents the noun-phrase subject trace, BED means a past-tense version of the verb "to be", in this case "was", and NP-OB1 is again a noun-phrase object.

This is the rule describing the NP-OB1 "the father of her child":

rule 8.) NP-OB1 —> D — N — PP

where D represents the determiner "the", N represents the noun "father", and PP represents the prepositional phrase "of the father".

This is the rule describing the prepositional phrase "of her child":

rule 9.) PP —> P — NP

where P represents the preposition "of" and NP represents the noun phrase "her child".

This is the rule describing the noun phrase "her child":

rule 9.) NP —> PRO$ — N

where PRO$ represents the possessive pronoun "her", and "N" represents the noun "child".

the examples as they appear in the corpus

We have now worked our way through the structure of the sentence,

Then the queen made great joy when she knew who was the father of her child.

Below is the example sentence as it appears, parsed and labelled, in the Middle English corpus.

Note: All the labels used in the corpus are listed in the appendix, with a brief description of each label. IPs that mark the main clause of a sentence are labelled "IP-MAT" ("MAT" for "matrix"), and other IPs are labelled "IP-SUB" ("SUB" for "subordinate.")

(
(IP-MAT
        (ADVP-TMP (ADV Thenne))
        (NP-SBJ (D the)
                (N quene))
        (VBD made)
        (NP-OB1 (ADJ grete)
                (N joye))
        (PP (P whan)
            (CP-ADV (C 0)
                    (IP-SUB
                            (NP-SBJ (PRO she))
                            (VBD knewe)
                            (CP-QUE
                                    (WNP-1 (WPRO who))
                                    (C 0)
                                    (IP-SUB (NP-SBJ *T*-1)
                                            (BED was)
                                            (NP-OB1 (D the)
                                                    (N fader)
                                                    (PP (P of)
                                                        (NP (PRO$ her)
                                                            (N Child)))))))))
        (E_S .))
(ID CMMALORY,5.135))

And here are the other examples:

And so he kissed the lady Igrayne.

(
(IP-MAT (CONJ and)
        (ADVP (ADV so))
        (NP-SBJ (PRO he))
        (VBD kist)
        (NP-OB1 (D the)
                (N lady)
                (NP-PRN (NPR Igrayne))))
(ID CMMALORY,4.95))

I am sick for anger and for love of fair Igrayne, that I may not be whole.

(
(IP-MAT-SPE (' ')
            (NP-SBJ (PRO I))
            (BEP am)
            (ADJP (ADJ seke))
            (PP
                (PP (P for)
                    (NP (N angre)))
                (CONJP (CONJ and)
                       (PP (P for)
                           (NP (N love)
                               (PP (P of)
                                   (NP (ADJ fayre)
                                       (NPR Igrayne)))))))
            (, ,)
            (CP-ADV (C that)
                    (IP-SUB
                            (NP-SBJ (PRO I))
                            (MD may)
                            (NEG not)
                            (BE be)
                            (ADJP (ADJ hool))))
            (E_S .)

And so they set a pavilion over the stone and the sword.

(
(IP-MAT (CONJ and)
        (ADVP (ADV so))
        (NP-SBJ (PRO they))
        (VBD sette)
        (NP-OB1 (D a)
                (N pavelione))
        (PP (P over)
            (NP
                (NP (D the)
                    (N stone))
                (CONJP (CONJ and)
                       (NP (D the)
                           (N swerd)))))
        (E_S ,))
(ID CMMALORY,10.288))

postscript: writing a formal grammar of the English language

In theory, it is possible to write a formal grammar which will describe every sentence in the English language (the rules above would constitute a very small subset of such a grammar). The most exhaustive grammar so far is The major syntactic structures of English, written in 1973 by Robert P. Stockwell, Paul Schachter, and Barbara Hall Partee. The book is 847 pages long. Changes in theory since it was written have rendered it obsolete.