LING5220/LING2220 2023

Phonetics II: Data Science

We’ve entered the age of Big Data in the study of spoken language, as in other areas of science and engineering. We have access to very large digital datasets in hundreds of languages, and it’s easy to collect more. And modern hardware and software makes it increasingly easy to get empirical answers to theoretical, practical, and social questions.

Phonetics research engages the relationship between signals (speech sounds and articulatory gestures) and symbols (phonemes, words, sentences, discourse structures), as well as the effects of dimensions such as style, attitude, identify, communication intention, and physiological state. In this course, you’ll learn to conduct such research using modern resources and methods, applying a Big Data perspective both to classic problems and to new ones. Topics include scripting and statistical techniques, automatic phonetic analysis, and integration of methods from speech technology.

Research of this type applies in any area where the facts of speech production and perception are relevant, at levels of linguistic analysis from phonetics to pragmatics, and for basic or applied questions from many different fields -- theoretical linguistics, psychology, sociolinguistics, language teaching and learning, clinical diagnosis, speech technology, and even poetics and musicology.

The course will be organized around three parallel tracks: Readings, to introduce problems and types of solutions; Exercises, to teach relevant techniques; and (individual or group) Projects. The details of the readings and exercises will depend to some extent on the backgrounds and goals of the participants.

The projects may be individual or group, and may be something a participant is already working on, or something new -- varied lists of possible projects will be made available. Graduate students enrolled in 5220 will produce a description of their project suitable for submission to a conference; undergrads enrolled in 2220 may do so if they wish.

There will be two meetings per week: each meeting might be a discussion of readings, or an interactive session to work on an assigned exercise; or a lab session to help plan and carry out an individual or group project.

The course materials will all be on line – no textbook or other purchases are required. Assignments and project reports will be submitted via the course Canvas site.

We will focus mainly on production rather than perception, and on acoustics rather rather than articulation, but we will also sketch the application of similar methods to articulatory data, and we will explore analogous approaches to perceptual studies.

The details will be dynamically adapted to the background and goals of the participants in the course.

The course will meet MW 1:45-3:15 pm, in 3401 Walnut St. Rm 326C.



[Some instructions about access to a phonetics lab server (harris.sas.upenn.edu), and pointers about unix usage, can be found here.]