Tense tents

Zwicky, Arnold. "Note on a phonological hierarchy in English." Linguistic change and generative theory 275-301 (1972).

A major point of contact between theoretical work in generative grammar and more traditional activities in historical linguistics is the search for conditions on the form and content of grammars. Such conditions function indirectly as predictions of the possibility of certain kinds of linguistic change; as a result, known changes can be used as a source of fruitful hypotheses about conditions in grammatical theory, and such changes can be inspected as sources of evidence for, or counterevidence to, particular systems of hypotheses. Much is concealed in my facile use of the phrases "conditions on the form and content of grammar" and "known changes," the latter in particular, for the pursuit of specific hypotheses normally entails a careful examination of accepted presentations of linguistic changes. But I shall not explore these issues here. Rather, I shall provide a few preliminary examples and then move to a consideration of some aspects of English phonology which supply evidence about the content of grammatical theory and thus, derivatively, about linguistic change.


Ohala, John. "Phonetic explanations for nasal sound patterns." Nasalfest: Papers from a symposium on nasals and nasalization, ed., by Charles A. Ferguson, Larry M. Hyman, and John J. Ohala, 289-316. (1975).

Universal sound patterns must arise due to the universal constraints or tendencies of the human physiological mechanisms involved in speech production and perception. The way the physical constraints of the speech mechanism leave their imprint on speech, particularly via sound change, can best be understood by likening speech communication to a transmission line with relay stations or “repeaters”, as in Figure 1 (page 290). A transmitter sends out a signal, u, to which noise, v, is added, yielding the distorted signal, w = u + v, which is picked up by the receiver, part of the repeater unit. It is this distorted signal, w, which is retransmitted as the signal, x, sent to the next repeater.

In the case of human speech, important sources of “noise” are the constraints of the transmitting and receiving systems, that is, limitations of the vocal tract and of the auditory mechanisms. This is represented schematically in Figure 2 (page 290). The speaker, although intending to produce a certain pronunciation may, due to vocal tract constraints, actually produce something slightly different. For example, the sequence [m] + [θ] is frequently rendered as [mpθ] e.g., warmth is pronounced [wɔrmpθ]. i.e., with an epenthetic stop. This is due to the fact that the soft palate and vocal cords prematurely adopt the position required for the following [e ] even while the labial closure of the [m] is held; in other words, due to anticipatory assimilation, the [m] becomes partially denasalized. (See further discussion below.) Since the listener does not have independent access to the mind of the speaker, he may take [wormpθ] to be the intended pronunciation and so, when he in turn speaks, may intentionally render the word with the epenthetic [p].

Auditory constraints affect pronunciation somewhat differently. Words containing speech signals which are auditorily ambiguous, i.e., % those which, as far as the listener is able to tell, may have been produced by any one of two or more distinct articulations, may be articulatorily reinterpreted by the listener when he repeats the given word.


Temperley, Mary S. "The articulatory target for final–s clusters." TESOL Quarterly 17, no. 3 (1983): 421-436.

Unlike many features of English pronunciation, the pronunciation of the set of final consonant clusters exemplified by the words tents: tense; bands: bans; acts: axe; and guests: guess (nasal, stop, or sibilant, with or without an additional stop, followed by /–s/ or /–z/) is not uniformly treated by dictionaries, phoneticians, and writers of ESL texts. These authorities recognize homophonous pronunciations to varying extents and give a variety of recommendations about pronunciation to ESL teachers and learners. But none of these treatments is adequate. Failure to recognize and teach homophonous pronunciations for these pairs of clusters is based on a longstanding but misguided allegiance to spelling pronunciations.

There are, in fact, good reasons for treating the pairs of clusters as homophonous in ESL teaching. I discuss four grounds: linguistic observation, linguistic patterning, linguistic history, and linguistic and pedagogical simplicity. Finally, I claim that the patterns represented by homophonous pronunciations of the pairs should not only be recognized but should also be taught explicitly because doing so helps students to produce final suffixes and thereby to use more grammatical English.


Fourakis, Marios, and Robert Port. "Stop epenthesis in English" Journal of Phonetics 14, no. 2 (1986): 197-221.

Some phonologists have claimed that the insertion of a stop between a sonorant and a fricative consonant in syllable-final sonorantfricative clusters follows from universal constraints on the human speech perception and produ ction mechanism. Others have claimed that the intru sive stops are products of language or dialect specific phonological rules that are stated in the grammar . In this experiment we examined the produ ction of sonorant - fricative and sonorant-stopfricative clusters by two groups of English speakers. One spoke a South African dialect and the other an American mid-western dfalect. The words tested ended in clusters of [n) or [l) plus [s) or [ts) and their voiced counterparts. Spectrographic analysis revealed that the South African speakers maintained a clear contrast between sonorantfricative and sonorant-stop-fricative clusters. The American speakers always inserted stops after the sonorant if the fricative was voiceless, but when the fricative was voiced, they more often omitted the stop in underlying clusters containing a stop (/ldz/ or /ndz/) but sometimes inserted a stop in clusters such as /n2/ or /lz/.

Measurements of the durations of the vowels, sonorants, stops and the final fricatives were made from the spectrograms . The inserted stop in the American productions was significantly shorter than the underlying one and its presence also affected the duration of the preceding nasal. This incomplete neutralization is similar to other cases reported in the literature. It is proposed here that cases of incomplete neutralization result from the application of language-specific rules that we call phase rules which govern articulatory timing. These rules sometimes appear to yield results very similar to those of segmental phonological rules.


Lee, Sook-hyang. "The duration and perception of English epenthetic and underlying stops." OSU Working Papers in Linguistics (1994).

In American English, an intrusive stop occurs before the fricative in words such as tense and false, making them very much like words with underlying stops, such as tents and faults. Ohala (1975) treats the inserted stop as an artifact of universal physiological or aerodynamic constraints. But this approach can't account for the fact that South African English speakers don't insert the stop between sonorant and fricative clusters (Fourakis and Port, 1986). Another approach posits a language- or dialect-specific phonological rule which inserts a phonological segment (Zwicky, 1972). Fourakis and Port (1986), argue against this approach on the ground that in some pairs the intrusive stop is significantly shorter than the underlying one (although the difference is always very small). This paper presents perception data and duration measurements supporting something like Zwicky's approach. Phrases with intrusive and underlying stops (intense and in tents, respectively) in citation forms produced by three speakers of Mid-Western dialects were presented over earphones in random order for subjects to identify. Identification was very poor, just at chance level. Also, duration measurements of the silence gap between the /n/ and /s/ in these words show no significant difference, contrary to Fourakis and Port's findings. Moreover, token judgments in the perception experiment show very poor correlation with the durations except for one speaker, implying that whatever duration differences there might not be a crucial cue that listeners exploit for labeling the words with epenthetic and underlying stops.


Yoo, Isaiah WonHo, and Barbara Blankenship. "Duration of epenthetic [t] in polysyllabic American English words." Journal of the International Phonetic Association 33, no. 2 (2003): 153-164.

This paper reports how stress and the position of the /ns/ cluster in polysyllabic words affect [t] epenthesis in four different environments: (1) word-medial after a stressed vowel (e.g. 'cen.sus vs. 'dents us); (2) word-final after a stressed vowel (e.g. in.'tense vs. in'tents); (3) word-medial after a stressless vowel (e.g. con.'sent vs. blunt 'say.ing); and (4) word-final after a stressless vowel (e.g. 'sci.ence vs. 'con.tents). Analysis of stop closure durations in experimental sentences read by seven American English speakers reveals that position, not stress, is the most important factor in [t] epenthesis: final position (e.g. science and intense) favors epenthesis. Stress is found to have an effect on stop closure durations in the way it interacted with word-position – i.e. for the final /ns/ cluster, stress immediately before it disfavors epenthesis (e.g. intense). The underlying /t/ is shown to be not significantly longer than the epenthetic [t]. Measurements from polysyllabic words in the TIMIT corpus corroborate the experimental results that word-final position favors epenthesis and that stress does not correlate with epenthesis. In the TIMIT data, however, underlying /t/ was significantly longer than epenthetic [t].


Kilpatrick, Cynthia, Ryan Shosted, and Amalia Arvaniti. "On the perception of incomplete neutralization." In Proceedings of the 16th International Congress of Phonetic Sciences (Saarbrücken), pp. 653-56. 2007.

The perception of American English epenthetic and underlying stops (as in prin[t]ce~prints) was examined in a forced-choice identification experiment that controlled for word frequency and familiarity, closure duration and presence of burst. The results showed that listeners are largely unable to distinguish minimal pairs on the basis of differences in closure duration and the presence or absence of burst; word frequency and familiarity had little effect on the results. Generally, listeners had more difficulty with stimuli with strong [t]s (long closure, burst) than with stimuli with weak [t]s, which they tended to categorize as “nce” words. Overall the results suggest that [ns]~[nts] is close to complete neutralization in favor of [nts].


Shosted, Ryan K. "An articulatory–aerodynamic approach to stop excrescence." Journal of Phonetics 39, no. 4 (2011): 660-667.

The distinction between underlying and excrescent stops in pairs like ‘mints’ and ‘mince’ was convincingly demonstrated by Fourakis and Port (1986). Several subsequent studies have been unable to replicate the result for speakers of American English, or have done so only partially. These studies have largely dealt with the acoustic signal. This study presents an approach to stop excrescence that refers to both the aerodynamics and articulation of the phenomenon. The results confirm and expand on the original findings. Using nasal flow as an indirect measure of velopharyngeal aperture and electropalatography (EPG) to estimate the moment of oral release, the presence of occlusion, as well as the duration of nasal and oral occlusion were measured. Overall contact across the palate was also measured. Disyllabic and monosyllabic tokens with /ns/ and /nts/ in final position were pronounced by four male speakers of American English. Disyllabic tokens could be either stressed or unstressed on the final syllable. In Experiment I, speakers produced tokens in a standard carrier phrase; in Experiment II, they produced one of the items in contrastive focus to its ‘homophonous’ counterpart, e.g., ‘I said mince not mints’. Underlying stops were significantly longer than excrescent stops, including in the contrastive-focus condition. A trading relation between nasal and oral stop duration was demonstrated when the stop was excrescent, but not when it was underlying. This suggests that the nasal–oral occlusion in epenethetic stops is divided proportionally between the underlying nasal and excrescent oral stop, but that the durations of the nasal and underlying oral stops are independent.


Livescu, Karen, Preethi Jyothi, and Eric Fosler-Lussier. "Articulatory feature-based pronunciation modeling." Computer Speech & Language 36 (2016): 212-232.


Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech, and it has been very difficult to mitigate in traditional phone-based approaches to speech recognition. An alternative approach, which has been studied by ourselves and others, is one based on sub-phonetic features rather than phones. In such an approach, a word's pronunciation is represented as multiple streams of phonological features rather than a single stream of phones. Features may correspond to the positions of the speech articulators, such as the lips and tongue, or may be more abstract categories such as manner and place.

This article reviews our work on a particular type of articulatory feature-based pronunciation model. The model allows for asynchrony between features, as well as per-feature substitutions, making it more natural to account for many pronunciation changes that are difficult to handle with phone-based models. Such models can be efficiently represented as dynamic Bayesian networks. The feature-based models improve significantly over phone-based counterparts in terms of frame perplexity and lexical access accuracy. The remainder of the article discusses related work and future directions.


Feldscher, Cara, and Karthik Durvasula. "Excrescent stops in American English." Proceedings of the Linguistic Society of America 2 (2017): 20-1.

This study investigates whether excrescence is phonological epenthesis or articulatory overlap by investigating whether a prosodic domain boundary intervening between the nasal and the fricative affects insertion. Articulatory overlap effects are expected to remain constant across the board, whereas phonological epenthesis is expected to be targeted for domain-final lengthening. The duration results are that neither excrescence nor underlying segments are affected by domain-final lengthening, but excrescent closure is significantly shorter than underlying closure. The short segments support an articulatory overlap hypothesis, but the lack of final lengthening opens up new questions for further research.