|
LING 520
Introduction to Phonetics
Fall 2006
|
Introduction
- The purpose of this lab is to learn how to use Praat and how to use it to do phonetic research efficiently.
You will be guided to explore and appreciate the power of Praat, and to establish ability of handling large data.
- The lab counts 5% of your final score.
- The lab is due on Sep. 26 (Tuesday) at 3:30pm.
Part I: Using Praat
There are a lot of online Praat tutorials (try google: Praat tutorial), many of them are very good. We recommend the one
written by Jean-Philippe Goldman (available at
http://www.unige.ch/lettres/linge/ppp/praat_tutorial.pdf). You may start with any tutorial you found good, as long as
you can learn how to do the following:
- Record and load a sound;
- Open an edit window, select a part of the sound in the edit window;
- Create a TextGrid with proper tier(s), and do segmentation and labeling.
Here are some suggestions on using a head-mounted microphone to make decent-quality recordings directly into your laptop:
- Choose a good microphone. A very important specification of a microphone is its frequency response. An ideal
microphone (not exist yet) would have a flat response for all frequencies. Generally speaking, a frequency
response that is flat (within 3dB) from 100 to 10000Hz is adequate for speech. You can also choose the new emerging
microphones with a single USB plug-and-play connection (They are slightly more expensive, but have higher quality),
for example, Samson C01U and Logitech Premium USB Headset 30.
- Set proper recording levels (for windows, go to Control Panel > Sound and Audio Devices > Audio > Sound recording >
Volume …). It’s important to set the recording level high to improve SNR (Signal to Noise Ratio),
but you don’t want to set levels so high that the microphone or sound card is overloaded. SNR is the ratio of the
signal level to the noise level. In general, an SNR greater than 50dB is adequate for the purpose of phonetic
studies. You can estimate SNR by following these steps:
- 1. Load your recording (or make a new recording) into Praat;
- 2. Edit the sound object, go to the edit widow, make sure that View > Sound autoscaling is checked on (see
illustration);
- 3. Go to the left bottom corner of the edit window, click on all. There are two numbers shown on the
left side of the sound wave window. The top one is positive and the bottom one is negative. These are the
maximum air pressures of the sound (only if Sound autoscaling is set on). Record the greater absolute
value of these numbers (see illustration). This is the maximum air pressure of the signal, P_signal, measured in Pa
(well, not exactly, please refer to
Praat documentation for more information);
- 4. Select a silent portion (no speech signal in it) of the recording, click on sel to show the selected
portion only. Again, record the greater absolute value of the two numbers shown on the left side of the sound
wave window. This is the maximum air pressure of the noise, P_noise;
- 5. Calculate SNR (in dB) of your recording: SNR = 20log(P_signal/P_noise), where log is 10-based;
- 6. Is your SNR greater than 50 dB? If not, try to set your recording level higher (be careful not to overload),
or move to a quieter place, and record again.
- For more information about microphone and recording, please refer to these excellent online documents: Microphone, Recording.
Practice:
1. Dwonload Praat from http://www.fon.hum.uva.nl/praat/ , install it on your
local computer.
2. Download file try1.wav , open Praat, load try1.wav into Praat by clicking on Read > Read from
file ... , and selecting the file "try1.wav" you just downloaded. A highlighted sound object called "try1" will appear
in the object window (looks like this).
3. Click on Edit, an edit window will pop up. If no settings of Praat have been changed, the upper part
of the window will display the waveform and the pulses of the sound, and the lower part will display the spectrogram,
formants, pitch and intensity (looks like this).
4. Click Pulses on the top bar of the edit window, then uncheck Show pulses. The pulses shown in the upper
window are now disappeared. Repeat the same procedure for Spectrum (unckeck Show spectrogram), Formant, Pitch,
and Intensity. Now your edit window should look like this.
5. Go back to the object window, click on Annotate, then choose To TextGrid…, a small window will pop up.
Replace “Mary John bell” on the top line with “word”, and delete “bell” from the bottom line, and then click OK.
By doing this, you'll create a TextGrid which has only one interval tier.
6. In the object window, a TextGrid object, also called "try1", will show up and be highlighted. Now press and hold the
‘Ctrl’ key and click the sound object. You will see that both the sound and the TextGrid objects are now
selected/highlighted. Click Edit on the right side of the object window, you will see a new edit window
like this.
7. Now click on the beginning end of the first word, you will see a vertical cursor shown in the TextGrid window.
There is a small circle on the top of the curosr. By clicking on the circle, you'll add one point of an interval in the
TextGrid. Now move the mouse to the end of the first word, click, and click the small circle, the second point of the
interval is then added. Click on any point between the two cursors, you will see the entire interval becomes yellow.
Write the word on the plane of the interval. If you want to move a cursor, just click on it and drag it; to remove a cursor,
click on it, then select Boundary > remove from the top bar, or press Alt+Backspace. Congratulations!
Now you know how to segment and label using Praat. Be sure to use the “all, in, out, sel’ buttons at the left-bottom
corner of the edit window to show the entire file, zoom in, zoom out, and show selected portion. You will need to listen
to different portions of sound when you do segmentation. You can do so by clicking on the horizontal bars shown at the
bottom of the edit window.
8. Segment and label all the words in try1.wav. After you are done, you should get a window like this.
9. Estimate the SNR of try1.wav. It should be about 60dB.
Part II: Segmentation and word duration
The ability of handling large data sets is crucial to most research in phonetics. You should keep in mind the following
‘rules’ when collecting and processing data:
- Name and organize files systematically.
How to name files depends on the purpose of your study, below are some general guides: 1. Do not use space;
2. The file names should have the same structure. For example, f_s1_001.wav and m_s8_112.wav, in which each
stem contains three parts connected/separated by underlines: the first part indicates the sex of the speaker, the
second part indicates speaker name, and the third part is the file number. 3. Try to make file names the same length,
including each individual part of the file name (sometimes this is impossible, for example, if you don’t know how many
sentences you will collect from a speaker at the beginning); 4. Make file names more informative.
- Don’t use pen and paper if you don’t have to.
It might be fun to write down what you measured and then input your data into a computer when enjoying the music. But
from now on, you should keep such an old-fashioned research pleasure only in your memory.
- If you mechanically repeat something, you should stop and think of scripting.
For this course, we don’t assume you have any programming background, and scripting is not required. But you should
start to learn how to script if you want to do research in the future.
Problem 1 (1 point):
Answer the following question: Is it a good idea to name files using more than two dots, for example, “S1.002.wav”,
in Praat? Why? (hint: Make four sound files (they can be four copies of the same file): S1.002.wav, S1.003.wav,
S1_002.wav, S1_003.wav, and load them into Praat, what happened?)
Problem 2 (4 points):
1. Obtain the wav and TextGrid files from ling520@harris.ling.upenn.edu, and save them to your local directory. The passward
has been emailed to you. Please keep it safe. If you are working on linux/unix, you can get the files
through sftp. If you are working on the windows, please follow these steps to get a free copy of SSH and to get the files using SSH:
transfer files with SSH. Each group has a different set of data, so be careful not to work on a wrong data set.
2. The TextGrid files provided to you mark the phrase boundaries of the sentences, as shown here. Load one pair of wav and TextGrid files into Praat at a time, and add word
boundaris and labels on the TextGrid. After you are done, save the changes you've made to the TextGrid (Don't rename the file) and move to another pair.
You can also use the script label.praat to facilitate your work. The script will automatically load all the wav-TextGrid pairs into Praat, one pair at a time. After you
finish one pair and click on "continue", the script will automatically save the current TextGrid and then load another pair into Praat. To run the script,
go to Control > Open Praat script..., and open the script file. A script window will show up. You need to change the directory name in the script (see illustration). After
you get the directory name right, just click on 'Run' or use Ctrl+R.
3. Download a praat script here, change the directory name in the script to your local
directoty where the TextGrid files are placed (see illustration), then run the script.
The script will calculate the durations of all the words in the TextGrid files and record the durations in a
big log file, word_duration.log, which will be saved in the same directory as your TextGrid files.
4. From word_duration.log calculate the average duration of all the words for each of the 15 categories (the first three letters of the filename define the category of the file):
neu, dis, pan, anx, hot, col, des, sad, ela, hap, int, bor, sha, pri, cot. What are the differences on word dutation among these categories?
These categories are different emotions. By listening to the sounds, find the emotion type of each category (hint: the filenames tell something).
Turn-in instruction:
1. Upload your TextGrid files and word_duration.log onto harris. There is a directory under lab1_turn-in/ for each group.
For example, group 1 should upload the files to lab1_turn-in/group1/.
2. Write a lab report that includes:
- your answer to problem 1;
- the average word duration of each of the fifteen categories and the emotion name (try your best) for each category;
- discussion on the relationship between word duration and emotion. For example, which emotion types tend to have
longer or shorter word duration? Which emotion types have similar word duration, any commonality among these
emotion types?
3. You may discuss with your partner and others, but you should write the lab report independently. Please send your
report in .pdf to my email (jiahong@babel.ling.upenn.edu) by 3:30pm, Sep. 26 (let me know if you need help on
generating pdf files).