Hong Zhang be defending his dissertation entitled "The Distribution of Disfluencies in Spontaneous Speech: Empirical Observations and Theoretical Implications" on Thursday, October 8th at 1:30pm EDT. The defense will be held on Zoom, and is open to members of the Penn Community.

The abstract of the dissertation is below.


Supervisor: Mark Liberman

Committee: Kathryn Schuler, Gareth Roberts

Date & Time: Thursday, October 8th, 1:30pm EDT


The Distribution of Disfluencies in Spontaneous Speech: Empirical Observations and Theoretical Implications


This dissertation provides an empirical description of the forms and their distribution of speech disfluencies in spontaneous speech. Although research in this area has received much attention in the past decades, large scale analyses of speech corpora from multiple communication settings, languages, and speaker's cognitive states are still lacking. Understandings of regularities of different kinds of disfluencies based on large speech samples across multiple domains are essential for both theoretical and applied purposes. As an attempt to fill this gap, this dissertation takes the approach of quantitative analysis of large corpora of spontaneous speech. The selected corpora reflect a diverse range of tasks and languages. The dissertation re-examines speech disfluency phenomena, including silent pause, filled pause (``um" and ``uh") and repetition, and provides the empirical basis for future work in both theoretical and applied settings. Results from the study of silent and filled pauses indicate that a potential sociolinguistic variation can in fact be explained from the perspective of the speech planning process. The descriptive analysis on repetitions has identified a new form of repetitive phenomenon in fluent spontaneous speech, repetitive interpolation, that could potentially serve as a valuable source of information for the modeling of speech production. Both the acoustic and textual properties of repetitive interpolation have been documented through rigorous quantitative analysis. The defining features of this phenomenon can be further used in designing speech based applications such as speaker state detection. Although the goal of this descriptive analysis is not to formulate and test specific hypotheses about speech production, potential directions for future research in speech production models are proposed and evaluated. The quantitative methods employed throughout this dissertation can also be further developed into interpretable features in machine learning systems that require automatic processing of spontaneous speech.