LING521 - Assignments

LING521 - Assignment 1

We're going to start with something substantive, and then work backwards through the underlying issues and the necessary infrastructure.

(The TIMIT corpus, which these examples come from, was created more than 30 years ago. Details (more than you probably want) can be found here. It has been cited more than 24,000 times, and is still widely used in speech research.}

Do the following in preparation for our meeting Wednesday 1/27/2021:

Create a folder on your laptop for this project, and download into this folder the archive SA2.tgz from Penn+Box.
Unpack the files this archive contains. On a Mac, you should be able to do this by double-clicking on it in a finder window, or in a terminal window:
```
cd <YourFolderWhateverItIs>
tar xvfz SA2.tgz
```
On Windows, double-clicking on the file icon should also work, or you can unpack it using WinZip or similar programs.

The result should include two subdirectories wavs and textgrids, each containing 630 files with names like

wavs/FADG0_SA2.wav
wavs/FAEM0_SA2.wav
wavs/FAJW0_SA2.wav
wavs/FAKS0_SA2.wav
...

textgrids/FADG0_SA2.TextGrid
textgrids/FAEM0_SA2.TextGrid
textgrids/FAJW0_SA2.TextGrid
textgrids/FAKS0_SA2.TextGrid

Now download into the same directory one of these four Praat scriptfiles and four corresponding text files with the extension ".praatnotes":
ScriptSA2a.praat SA2a.praatnotes
ScriptSA2b.praat SA2b.praatnotes
ScriptSA2c.praat SA2c.praatnotes
ScriptSA2d.praat SA2d.praatnotes
I'll send the class a note via email with assignments of individuals to files.
Now start up Praat and read in the script file assigned to you (via Praat>>Open Praat Script ) -- it should lead you through 1/4 of the 630 instances of the SA2 sentence in TIMIT, one at a time. In parallel, open the corresponding .praatnotes files in your favorite text editor. It will contain lines like these:
```
1 FAKS0_SA2 0.739188 1.10669 0.1675
2 FDAC1_SA2 0.05 0.53025 0.28025
3 FELC0_SA2 0.16 0.6105 0.2505
4 FJEM0_SA2 0.116375 0.585062 0.268687
5 MDAB0_SA2 0.05625 0.4 0.14375
...
```
Each is a filename, with the starttime, endtime, and duration of the stretch being shown -- and you can add to each line notes about what you see and hear in the specified part of the specified file. (You don't need to fill anything out before our discussion on Wednesday.)
In class on Wednesday, we'll discuss what kind of classification/annotation/measurement we should do, in order to characterize the 630 different pronnciations of this word in versions of the SA2 sentence in the TIMIT dataset. Come prepared with some ideas.
Then to finish the assignment, decide on a classification and/or measurement system that makes sense to you, and apply it you your subset of the examples, as notations in the .praatnotes file. Submit that file via Canvas.
Think about the questions raised in this brief discussion of two other TIMIT utterances.