The distribution of wanna-contraction

From Lakoff (1970):

Larry Horn has pointed out (personal communication) the following minimal pair:

(20) a. Teddy is the man I want to succeed.
b. Teddy is the man I wanna succeed.

Here 20a is ambiguous, and can be understood as either of the following:

(21) a. I want Teddy to succeed.
b. I want to succeed Teddy.

But 20b can only be understood in the sense of 21b, since want to cannot contract to wanna if there is an intervening NP between want and to at an earlier point in the derivation, as there is in 21a.

Many subsequent researchers have offered many alternative interpretations of the same facts, in many different formalisms. In terms of the formalism Lakoff assumed, "The above facts show an interaction between the rules of the syntax and those of the phonology", and require "a constraint operating at two separate points in the grammar, and it involves the notion 'corresponding node'". In later treatments, a variety of different grammatical levels play a variety of roles, but the question of how to communicate relevant information across levels remains a matter of interest. See Goodall (2017) for a summary.

Goodall writes (Section 2.7):

[W]e expect a fully adequate analysis of wanna-contraction to show that the various restrictions that we have seen follow naturally and without stipulation from other, independently needed properties of the grammar. In this regard, one can wonder whether the difficulty that the field has had historically in finding such an analysis has resulted from trying to frame the generalizations in specifically syntactic terms.

Some recent analyses, such as Ackema and Neeleman (2003) and Anderson (2008), have tried to shift the focus of the analysis of wanna-contraction from syntax to prosody. [...]

The core of these prosodic analyses rests on the very natural idea that wanna is possible only when want and to are adjacent within the same prosodic phrase. [...]

For the most part, then, the prosodic analysis has the same empirical coverage as the best of the syntactic analyses. As far as is known, the environments in which want and to are in the same prosodic phrase are the same as when want takes a control clause complement headed by to, so the predictions made by prosodic and syntactic analyses appear to be completely overlapping (though this, of course, does not rule out the possibility of empirical differences being discovered in the future). Conceptually, however, the prosodic analyses do seem to have an advantage. The basic generalization of when wanna-contraction is possible (i.e. when want and to are adjacent in the same prosodic phrase) is expressed very simply and naturally in prosodic terms, whereas capturing this same environment in syntactic terms does not lead to a formulation that is as clearly simple and natural.

But in this literature, the facts of prosodic phrasing are simply asserted or assumed. Thus Anderson:

"Apparently, deletion sites (including the gaps left by displacement operations) induce an immediately following PPhrase boundary."

And Ackema and Neeleman:

"There are other phenomena that can be understood more easily if traces trigger φ-closure. Although we cannot discuss this in detail, we think that wanna contraction in English is a case in point. This process can be analyzed as largely parallel to cliticization in Dutch: if to finds itself in the same prosodic domain as a verb, the two elements are realized as a single prosodic word".

So the main goal of these notes is to explore the prosodic phrasing of wanna-contraction -- at some later point I'll take a look at other relevant kinds of contraction.

But as we start looking at examples from the real world, we'll run into a descriptive problem. As Goodall notes, the basic wanna-contraction facts are not so totally clear cut:

To my knowledge, no one has questioned the general existence of this contrast, but there have been occasional suggestions that there are “liberal dialect” speakers who do not have it (e.g. Postal and Pullum 1982; Carden 1983; Pullum 1987). Unfortunately, there appear not to have been any systematic studies of such individuals, but recent experimental work suggests that they do in fact exist. Zukowski and Larsen (2011) report the results of a production study with 14 adults in which the participants produced wanna in illicit environments like (1b) at a rate of 10.6 percent, compared to a rate above 50 percent in environments like (1a). This result clearly confirms the basic contrast between the two environments that has been discussed so often, but it also seems odd, in that if (1b) were truly ungrammatical, one might expect a contraction rate closer to 0 percent.

I searched a large corpus of 105,817 transcribed NPR podcasts, and examined 32 examples, chosen at random, of the form

who (DO) PRO want to VERB

In 14 cases where the subject of VERB is the same as the subject of want, contraction occurred 12 times (86%). For example (from All Things Considered, 8/9/2019):

(1)

Karen Wonders: And when we were looking at
who do we wanna hire to fill this role
of running this new office,
Nick was top of our list.

The existence of some uncontracted cases here is expected, since the contraction is optional. Here's an example from Talk of the Nation, 7/3/2012:

(2)

John Donvan: Who- who did you want to address,
and what did you want them to understand?

And in 18 cases where the subject of VERB is who (e.g. "who do you want to lead America"), there was no contraction 16 times (88%).Thus this example from News and Notes, 3/18/2008:

(3)

Farai Chideya: If it gets made into a movie, who do you want to play you?

But there were two cases like this one, from Weekend Edition, 8/31/2013:

(4)

Scott Simon: I- I know you have to be careful answering this question, but do you have any strong feelings about what-
who- who you wanna prevail
in your country?

And here's another example (from a different search) of [want trace to VERB] with apparent wanna-contraction -- Talk of the Nation, 2/23/2009:

(5)

Mr. Blow: but I think that that's, you know, also problematic. I think it turns what would be a helpful conversation into a confrontational sort of us-versus-them situation, and that never produces the outcome that you wanna come from a conversation like this.

I think I could also say those two examples with wanna, to the extent that prosodic intuition is to be trusted.

So Larry Horn's generalization about wanna-contraction is mostly true -- but there are exceptions.

Do such exceptions matter? Not if they're mistakes, but I really don't think that they are.

The impact on theories depends on what the various interacting aspects of the theories are, both on the P-side and on the S-side. Across theories, the options obviously include a "dialect difference", and/or stochastic rules that influence outcomes without categorically determining them.

Without excluding either of these possibilities, I have a different suggestion:

Larry Horn's observation about wanna-contraction is not categorically true in phonetic terms.
But it remains morpho-syntactically true, and not just stochastically.

This is because there are two different sources for the class of pronunciations conventionally spelled "wanna":

A set of regular (regular, optional, gradient) phonetic lenition phenomena.
A lexicalized phonological representation of the fully-lenited form.

And the apparent exceptions (wanna-contraction across a trace) are phonetic lenition rather than morpho-syntactic substitution. For a plausible theoretical background, see Liberman (2017). If this is true, then the wanna-contraction problem is back where it started.

In any case, the putative role of prosodic phrasing in wanna-contraction remains on the table, at least descriptively. So I'll put further discussion of lenition-versus-lexicalization into a separate section at the end of these notes, and focus first on the key prosodic question: Is it true, in sequences of the form want [trace] to and in other analogous cases, that "deletion sites (including the gaps left by displacement operations) induce an immediately following PPhrase boundary" (Anderson), or "traces trigger φ-closure" (Ackema & Neeleman)?

Prosodic issues in wanna-contraction

Some proposals argue that in structures with a trace between want and to, where wanna-contraction doesn't apply (or applies less often), there's a prosodic phrase boundary between want and to -- which is what prevents (or inhibits) contraction. In favor of this argument is the idea that the influence of traces on prosodic phrasing is asserted to be a general thing, not specific to contraction.

In the wanna-contraction cases, this claim is so often so obviously wrong that I wonder whether I've misunderstood something. (I don't think the claim fares any better in other cases, but for now we should focus on the wanna facts, limiting our attention to cases where contraction doesn't occur across the trace.)

Let's start with example (3) above. The phrase "who do you want to play you?" is spoken at a constant rapid pace, with no pause or timing indication of a juncture between want and to play you, no suggestion of a junctural tone either, no resetting of pitch range, nothing:

Here's the next random [want trace to VERB] example from the list I checked (News and Notes 7/29/2008):

(6)

Farai Chideya: He's saying there, who do you want to lead America?

Again, none of the normal correlates of prosodic phrase boundaries can be found between to and lead.

Another example, from All Things Considered, 2/19/2006:

(7)

Gary Covino: when we asked Philipinos,
who do you support, who do you want to win,
they would say Cory Aquino

Again, the juncture between want and to win has none of the phonetic correlates of a prosodic phrase boundary: no lengthening or silence pause, no boundary tones, no pitch range resetting.

Another example (from a different egrep pattern, not included in the 16-out-of-18 stats above) -- Day to Day 10/4/2007, with two [want trace to VERB] phrases in a row, and no evidence of a trace-associated prosodic boundary in either of them:

(8)

John Dickerson: Well that's exactly right.
You know we had the- the Republicans had this problem
is the people they want to stay are leaving,
and the people they want to leave are staying.

Here's an example from Talk of the Nation, 10/16/2012, again with no hint of trace-associated prosodic phrase boundary:

(9)

Peter Robinson: So you can- you can kind of put a focus on the-
the soundbite that you want to come out of that day

And a final example from Day to Day, 7/17/2006:

(10)

Ron Elving: ((He)) says once you've got everybody talking about a third world war,
you can ask the question,
who do you want to win such a war?
Us or them?

Prosodic phrasing in English is pretty variable, and so I'm sure it's possible to have a phrase boundary aligned with the trace in some sentences containing the sequence [want trace to VERB]. But this was not true in any of the random examples that I looked at for this note (and I've spared you most of the clips that I examined...)

I'm convinced by Steve Anderson's argument that cliticization phenomena are central to contraction. But something else needs to substitute for the role that "prosodic phrasing" plays in his proposal. If sometimes-intervening junctures play a role, it seems more likely that they're operating at the level of syllable- and foot-formation, rather than at the level(s) of "intonational phrasing" that is normally marked by (pseudo-)pauses, boundary tones, pitch range resetting, etc.

Lenition of infinitival to

The lenition of infinitival to seems to be a complex phenomenon -- complex phonetically, complex prosodically, and complex morpho-syntactically.

It's especially complex in the case of want to, since three underlying consonants /nt#t/ reduce, across a word boundary, to a single short nasal.

So the first point to make is that (in American English) intervocalic /nt/ often becomes a nasal tap. Thus this example, from Talk of the Nation, 12/7/2012:

(11)

Robert Provine:Laughter is the sound of labored breathing of rough-and-tumble play,
where the panting sound of our exertions became ha ha.
So the ancestral chimp laughter is a panting sound.
Being a chimp in good standing, I'll give you a sample.

(Amusingly, in this case NPR's transcription service rendered the first panting as "panning"...)

The next point is that word-final 'nt' can also loose its final voiceless stop, when a sonorant follows. Here's an example with two of them in a row, from All Things Considered, 2/11/2016:

(12)

Miller (as Weasel): ((nah)) I don't wanna.

Note that in this performance (from the movie DeadPool), the initial stop in don't has become a coronal approximant; a nasalized vowel is the only residue of the /nt/ of don't; and the final vowel of "wanna" is actually mid-high and rounded (by harmony with the earlier rounded sounds?). So the result is something like

ɮõwɐ̃nõ

(...though a usual, the IPA is not much help here...)

Similar things happen frequently in less slurred speech, in sequences like don't ask or want a -- I'll spare you the real-world examples.

And we can also look at infinitival to in other cases. Some of these (like expect to) can have the same syntactic ambiguity as want to, between simple infinitival complement and a whole non-finite sentences with trace for subject. Others can be unambiguously traceless, like have to (which also arguably has a lexicalized lenited form, conventionally spelled "hafta"), or plan to. In other cases, the to may come from a purpose clause, which might be adjacent to the verb only if a trace intervenes, as in hire to.

Phonetically, the initial /t/ may become voiced (if it followed a voiced sound); if unvoiced, it may have a strong burst and aspiration, or very weak and short burst and aspiration, or none at all; the vowel may remain [u], may be reduced to schwa, or may entirely disappear. All of these things can happen in cases where there's (apparently) no issue of lexicalized contraction.

Let's look at example (1) again, this time at the sequence [...hire trace to...]:

(13)

Karen Wonders: who do we wanna hire to fill this role

Notice three things.

First, "hire to" involves a reduced, if not contracted, form of to -- the /t/ becomes a voiced tap, and the vowel is reduced to schwa. In IPA, something like [hɐ^ɪ˞ɾə].
Second, this reduction is not prevented by the intervening trace. Maybe the to in hire to fill cliticizes to the following word fill? But then why does the initial /t/ become a voiced flap, which normally happens only in intervocalic ambisyllabic positions?
And third, the trace following hire does not generate any obvious intonational phrase boundary (though maybe hire is a bit lengthened).

For comparison, I selected 12 random example of hire to from the NPR corpus (excluding any with a silent pause or breath between hire and to). In 6 of them, the /t/ of to was rendered as a short voiced stop as in example (13) above -- in the other 6, the /t/ of to was performed as a voiceless aspirated stop, though the duration of the stop gap and aspiration varied widely.

(Tentative) Conclusions

There are good reasons to think that wanna exists in American English as a lexicalization of (a series of) phonetic lenition phenomena, though the relevant morpho-syntax remains fuzzy.

Horn's generalization (that wanna-contraction is blocked across traces) is at least statistically true.

The phonetic effects involved in wanna-contraction are general, and therefore can presumably happen where there is no morpho-lexical substitution. This might restore Horn's generalization to non-stochastic status.

But in either case, the theoretical explanation of Horn's generalization remains elusive. As Goodall observes, description in syntactic terms is possible, but not as "natural" or inevitable as one would like.

The "trace creates a prosodic phrase boundary" idea is pretty clearly false, at least if "prosodic phrase" has anything like its ordinary meaning.

However, Anderson's proposal about the role of cliticization remains promising, and might be rescued if the juncture involved is something more on the level of "phonological word", or in any case something playing a role in syllable- and foot-formation. But the necessary description and theoretical work remains to be done.

So overall, an interesting puzzle that is not yet solved.

The role of frequency and predictability

In such lenition and lexicalization phenomena, there are clear effects of formality, frequency/recency/redundancy, precision of articulation, and dialect/variety/micro-syntactic variation.

There's some evidence for the role of frequency and predictability -- e.g. this figure from Kilbourn-Caron et al. (2020), based on the treatment of word-final /t/ (at the end of the "target word") followed by a vowel-initial word (the "trigger word"):

And this from Yuan 2015, dealing with the reduction in Mandarin (often but not always changing the apparent phonetic category) of morpheme-initial plosives:

"For every consonant type we use the duration corresponding to its ~20% cumulative percentage as the threshold of reduction: 30 ms for unaspirated stops, 40 ms for unaspirated affricates, 50 ms for aspirated stops, and 60 ms for aspirated affricates."

References

Ackema, Pete, and Ad Neeleman. "Context-sensitive spell-out." Natural Language & Linguistic Theory 21, no. 4 (2003): 681-735.

Anderson, Stephen R. "English reduced auxiliaries really are simple clitics." Lingue e linguaggio 7, no. 2 (2008): 169-186.

Goodall, Grant. "Contraction." The Wiley Blackwell Companion to Syntax, Second Edition (2017): 1-20.

Lakoff, George. "Global rules." Language (1970): 627-639.

Liberman, Mark. "Towards progress in theories of language sound structure." Shaping Phonology (2017).

Kilbourn-Ceron, Oriana, Meghan Clayards, and Michael Wagner. "Predictability modulates pronunciation variants through speech planning effects: A case study on coronal stop realizations." Laboratory Phonology: Journal of the Association for Laboratory Phonology 11, no. 1 (2020).

Yuan, Jiahong, and Mark Liberman. "Investigating consonant reduction in Mandarin Chinese with improved forced alignment." Interspeech 2015.