LING521 - Assignment 1

We're going to start with something substantive, and then work backwards through the underlying issues and the necessary infrastructure.

(The TIMIT corpus, which these examples come from, was created more than 30 years ago. Details (more than you probably want) can be found here. It has been cited more than 24,000 times, and is still widely used in speech research.}

Do the following in preparation for our meeting Wednesday 1/27/2021:

  1. Create a folder on your laptop for this project, and download into this folder the archive SA2.tgz from Penn+Box.
  2. Unpack the files this archive contains. On a Mac, you should be able to do this by double-clicking on it in a finder window, or in a terminal window:
    cd <YourFolderWhateverItIs>
    tar xvfz SA2.tgz
    On Windows, double-clicking on the file icon should also work, or you can unpack it using WinZip or similar programs.
  3. The result should include two subdirectories wavs and textgrids, each containing 630 files with names like
    wavs/FADG0_SA2.wav
    wavs/FAEM0_SA2.wav
    wavs/FAJW0_SA2.wav
    wavs/FAKS0_SA2.wav
    ...
    textgrids/FADG0_SA2.TextGrid
    textgrids/FAEM0_SA2.TextGrid
    textgrids/FAJW0_SA2.TextGrid
    textgrids/FAKS0_SA2.TextGrid
  4. Now download into the same directory one of these four Praat scriptfiles and four corresponding text files with the extension ".praatnotes":
    ScriptSA2a.praat  SA2a.praatnotes
    ScriptSA2b.praat  SA2b.praatnotes
    ScriptSA2c.praat  SA2c.praatnotes
    ScriptSA2d.praat  SA2d.praatnotes
    I'll send the class a note via email with assignments of individuals to files.
  5. Now start up Praat and read in the script file assigned to you (via Praat>>Open Praat Script ) -- it should lead you through 1/4 of the 630 instances of the SA2 sentence in TIMIT, one at a time. In parallel, open the corresponding .praatnotes files in your favorite text editor. It will contain lines like these:
    1 FAKS0_SA2 0.739188 1.10669 0.1675
    2 FDAC1_SA2 0.05 0.53025 0.28025
    3 FELC0_SA2 0.16 0.6105 0.2505
    4 FJEM0_SA2 0.116375 0.585062 0.268687
    5 MDAB0_SA2 0.05625 0.4 0.14375
    ...
    Each is a filename, with the starttime, endtime, and duration of the stretch being shown -- and you can add to each line notes about what you see and hear in the specified part of the specified file. (You don't need to fill anything out before our discussion on Wednesday.)
  6. In class on Wednesday, we'll discuss what kind of classification/annotation/measurement we should do, in order to characterize the 630 different pronnciations of this word in versions of the SA2 sentence in the TIMIT dataset. Come prepared with some ideas.
  7. Then to finish the assignment, decide on a classification and/or measurement system that makes sense to you, and apply it you your subset of the examples, as notations in the .praatnotes file. Submit that file via Canvas.
    Think about the questions raised in this brief discussion of two other TIMIT utterances.