LING521 - Spring 2022

I've added an introductory section giving a bit of background for this year's focus: data structures, algorithms, and interfaces for searching and analyzing phonetic datasets. The rest of this page is still the Spring 2021 list of readings -- but in fact we'll choose topics and readings based on the background and interests of the course participants. So the following is certain to change before the course starts, and will change further as it develops...

Data structures, algorithms, and interfaces

(Note that there are dozens of other creatures in this bestiary, many of them extinct or rare...)

"HTK Transcriptions and Label Files".

Amy Isard, David McKelvie, Andreas Mengel, and Morten Baun Møller. "The MATE Workbench Annotation Tool, a Technical Description." LREC 2000.

The Nite XML Toolkit Homepages.

Steven Bird, David Day, John Garofolo, John Henderson, Christophe Laprun, and Mark Liberman. "ATLAS: A flexible and extensible architecture for linguistic annotation", LREC 2000.

Steven Bird and Mark Liberman, "A formal framework for linguistic annotation", Speech Communication 2001. (See also Penn CIS Tech Report version from 1999.

Kazuaki Maeda, Haejoong Lee, Julie Medero, Stephanie Strassel. "A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit" LREC 2006.

"ELAN Annotation Format (EAF)" -- also ELAN import and export options, and the formats listed there.

Praat TextGrid file formats

Brian MacWhinney and Johannes Wagner, "Transcribing, searching and data sharing: The CLAN software and the TalkBank data repository", 2010.

The EMU Speech Database Management System (EMU-SDMS)

Music21: A toolkit for computer-aided musicology. (Music21 Documentation)


Some Automatic Analysis Techniques

Fox, Michelle Annette Minnick. Usage-based effects in Latin American Spanish syllable-final/s/lenition. University of Pennsylvania, 2006.

Yuan, Jiahong, and Mark Liberman. "Investigating/l/variation in English through forced alignment." In Tenth Annual Conference of the International Speech Communication Association. 2009.

Yuan, Jiahong, and Mark Liberman. "Automatic Measurement and Comparison of Vowel Nasalization across Languages." In ICPhS, vol. 17, pp. 2244-2247. 2011.

Yuan, Jiahong, and Mark Liberman. "Automatic detection of “g-dropping” in American English using forced alignment." In 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 490-493. IEEE, 2011.

Yuan, Jiahong, and Mark Liberman. "/l/variation in American English: A corpus approach." Journal of Speech Sciences 1, no. 2 (2011): 35-46.

Ryant, Neville, Jiahong Yuan, and Mark Liberman. "Automating phonetic measurement: The case of voice onset time." In Proceedings of Meetings on Acoustics ICA2013, vol. 19, no. 1, p. 060277. Acoustical Society of America, 2013.

Yuan, Jiahong, and Mark Liberman. "Investigating consonant reduction in Mandarin Chinese with improved forced alignment." In Sixteenth Annual Conference of the International Speech Communication Association. 2015.

Ryant, Neville, and Mark Liberman. "Large-scale analysis of Spanish/s/-lenition using audiobooks." In Proceedings of Meetings on Acoustics 22ICA, vol. 28, no. 1, p. 060005. Acoustical Society of America, 2016.

Yuan, Jiahong, Wei Lai, Chris Cieri, and Mark Liberman. "Using forced alignment for phonetics research." Chinese Language Resources and Processing: Text, Speech and Language Technology. Springer (2018).

Ma, Danni, Neville Ryant, and Mark Liberman. "Probing acoustic representations for phonetic properties." In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 311-315. IEEE, 2021.


1. Lexicon/Phonology/Phonetics

The first section of the course will deal with the problem of "allophonic variation" in segmental phonology and phonetics.

We will undertake a couple of joint exercises in classification and measurement, both to explore concepts and issues, and to learn techniques. A brief overview of the theoretical issues can be found in

Mark Liberman, "Towards Progress in Theories of Language Sound Structure", in Shaping Phonology , 2018

Our first class exercise will look at realizations of the final consonant cluster in the word "don't" in versions of the calibration sentence "Don't ask me to carry an oily rag like that" in the TIMIT corpus, representing one sub-case of the phenomenon misleadingly called "t/d deletion".

We'll also look at allophonic variation in syllable-final /s/ in Spanish.

Some relevant background reading:

Sheila Blumstein and Kenneth N. Stevens, "Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants." The Journal of the Acoustical Society of America 66, no. 4 (1979): 1001-1017.

David Pisoni, "Variability of vowel formant frequencies and the quantal theory of speech: A first report." Phonetica 37, no. 5-6 (1980): 285-305

Mark Liberman and Janet Pierrehumbert. "Intonational invariance under changes in pitch range and length". In Language sound structure, ed. by Mark Aronoff and Richard Oerhle,  (1984): 157-233.

Kenneth Stevens,  "On the quantal nature of speech." Journal of phonetics 17, no. 1 (1989): 3-45.

Joseph Perkell,  Melanie Matthies, Harlan Lane, Frank Guenther, Reiner Wilhelms-Tricarico, Jane Wozniak, and Peter Guiod. "Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models." Speech communication 22, no. 2-3 (1997): 227-250.

Kenneth Stevens and Samuel Jay Keyser. "Quantal theory, enhancement and overlap." Journal of Phonetics 38, no. 1 (2010): 10-19.

Neville Ryant and Mark Liberman, "Large-scale analysis of Spanish /s/-lenition using audiobooks", ICA 2016

Yuan, Jiahong, and Mark Liberman. "Automatic Measurement and Comparison of Vowel Nasalization across Languages." In ICPhS, vol. 17, pp. 2244-7. 2011.

Yuan, Jiahong, and Mark Liberman. "Automatic detection of “g-dropping” in American English using forced alignment." In 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 490-493. IEEE, 2011.

Yuan, Jiahong, and Mark Liberman. "/l/variation in American English: A corpus approach." Journal of Speech Sciences 1, no. 2 (2011): 35-46.

Ryant, Neville, Jiahong Yuan, and Mark Liberman. "Automating phonetic measurement: The case of voice onset time." In Proceedings of Meetings on Acoustics ICA2013, vol. 19, no. 1, p. 060277. Acoustical Society of America, 2013.



2. Some Approaches to F0 modeling

Overall F0 trends at syllable and phrase scale
"Tunes, political and geographical"
"Macronic and Trumpish prosody" -- and more of the same...

Grabe, Esther, Greg Kochanski, and John Coleman. "Quantitative modelling of intonational variation." (2004).

Grabe, E., G. Kochanski, and J. Coleman. "Empirical Validation of Hand-labelled Nuclear Accent Patterns." (2006).

F0 as a Time Series
(See also FPCA in R and "Introduction to subspace methods"...)

Why tone is not pitch, and pitch is not F0...




3. Some Relevant Datasets

TIMIT - LDC93S1W - Garofolo, John S., L. F. Lamel, W. M. Fisher, Jonathan G. Fiscus, D. S. Pallett, and Nancy L. Dahlgren. "Darpa timit acoustic-phonetic continuous speech corpus cd-rom {TIMIT}." (1993).

Yuan, Jiahong, Hongwei Ding, Sishi Liao, Yuqing Zhan, and Mark Liberman. "Chinese TIMIT: A TIMIT-like corpus of standard Chinese." In 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), pp. 1-5. IEEE, 2017.




4.Dynamic time warping

Kruskal, J.B.. "An overview of sequence comparison: Time warps, string edits, and macromolecules." SIAM review 25, no. 2 (1983): 201-237.

Kruskal, J.B. & Liberman, M. "The symmetric time-warping problem: From continuous to discrete", in Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley 1983.

Rajagopalan, Shreevatsa. "A user-friendly tool for metrical analysis of Sanskrit verse." Computational Sanskrit & Digital Humanities: 113.

Dan Ellis, Lecture on "Beat Tracking"

Ellis, Daniel. “Beat tracking by dynamic programming.” Journal of New Music Research 36.1 (2007): 51-60.

Bartelds, Martijn, Caitlin Richter, Mark Liberman, and Martijn Wieling. "A new acoustic-based pronunciation distance measure." Frontiers in Artificial Intelligence 3 (2020).




5. Portuguese Vowel Lenition


David Silva,. "The variable elision of unstressed vowels in European Portuguese: a case study." 1994.


David Silva. "The variable deletion of unstressed vowels in Faialense Portuguese." Language Variation and Change 1997.

David Silva. "Vowel lenition in Sao Miguel Portuguese." Hispania 1998.

Paola Escudero, Paul Boersma, Andréia Schurt Rauber, and Ricardo AH Bion. "A cross-dialect acoustic description of vowels: Brazilian and European Portuguese." JASA 2009.


Leda Bisol and João Veloso. "Phonological Processes Affecting Vowels: Neutralization, Harmony, and Nasalization." The Handbook of Portuguese Linguistics (2016): 69-85.


Juliana Simoes Fonte. "The unstressed vocalism in the history of Portuguese." Alfa: Revista de Lingüística 2017.


6. More on Allophonic Variation

Tense Tents