Linguistics 520 -- Fall 2008 -- Lab 6
Effects of phrase position on duration
1. Record a digit-string set. This is a set of 100 grouped strings of the digits from 0 to 9, designed so that each digit occurs 10 times in each position, and each pair of digits occurs once spanning each pair of positions.
The sets linked below are 7 long, 3+4, in the style of U.S. telephone numbers.
You can record such a set in any (nearly) any language. For English, the rules should be:
- Name each digit separately -- thus 3456 should be "three four five six", not "thirty four fifty six" or whatever.
- Read fluently, with a natural grouping but without actual silent pauses within the string.
- If you also read the line number at the start of each string, it will help you keep track of where you are: thus
37: 457-2690
would be
"Number thirty seven -- four five seven, two six nine zero". - Where there are alternative digit names, everyone recording in a given language should use the same name. Thus in English, everyone should pronounce digit 0 as "zero" rather than "oh".
For reading in another language, please adapt these rules as appropriate.
Pick one of the ten sets here (1, 2, 3, 4, 5, 6, 7, 8, 9, 10); or make your own using digitstring.R and digitstring.sh.
The recording should be monophonic (single channel) and at a moderate sampling rate (typically 11025 Hz). If you end up recording e.g. 44.1 kHz stereo, convert the results before going forward.
2. Use Praat to create a TextGrid interval tier, in which each digit string is segmented into its constituent digits.
- Label each digit with the corresponding (single-character) number.
- If there are silent pauses within digit strings, label them with a hyphen '-'.
- Before each string, label a "junk" interval (e.g. the period of time between the end of one string and the start of the next, possibly including something like "number thirty seven") as "#1", "#2", ... "#100" to mark the line number in the original file.
- If there are mistakes or extraneous sounds or whatever, just leave them out, or label them with anything but a digit or a label of the form "#N".
3. How to transfer your measurements to R (or another statistics program) for analysis.
once you've recorded your material, found the segmentation points, and saved the .TextGrid file containing them, you could do the rest of the process with pencil and paper -- but this would be quite tedious.
The good news is that the rest of the labor can be entirely automated.
The bad news is that this requires some programming. So you'll either have to learn some (new) programming techniques, or get Josh or Mark to help you with this piece of the lab.
Some of this needs to be in Praat's built-in scripting language, to put durations (and measurements of pitch, formants and other parameters) into an output file in convenient form to be assimilated by another program that you can use for the statistical analysis, graphing and so on. And some of the programming may need to take place in between Praat and the statistics program, so as to make the transfer easily on both ends.
There are many ways to do this. Below you'll find links to one example of how it can be done for my own recordings of two number-string lists from this experiment.
You can get everything that you need in one (18 MB) zip file: LAB6.zip
Or you can fetch the contents individually as needed. (If you're only doing the duration part, you won't need the the .wav files, but only the .TextGrid files, and the scripts dodurations, GetDurations, and Digitstring1.R)
The recordings:
MYL_List8.wav MYL_NList4.wav MYL_NList4a.wav
The TextGrids:
MYL_List8.TextGrid MYL_NList4.TextGrid MYL_NList4a.TextGrid
The shell script that runs the whole process -- to extract the parameters, to transform them for ease of access in the statistics system R, to read them into R, and to create some plots. This script should run on a Unix (Linux or Mac OS-X) system like those in the lab. The comments in the script should help you to pull pieces out and adapt them to your needs.
An alternative script that only does the (first) duration part of the process:
Three needed Praat scripts (only the first one is needed for durations alone):
GetDurations NumPitchScript NumFormScript
Three needed R scripts (only the first one is needed for durations alone):
Digitstring1.R Digitstring2.R Digitstring3.R