Linguistics 001 Homework 8 Due Mo 11/14/2005
This homework is intended to exercise your skills in linguistic analysis at various levels, in preparation for your final project. You can collaborate in small groups (three or fewer) on this assignment, but make sure that everyone involved understands how to do each part of the analysis.
(1) A link to a transcript of President George W. Bush's address nominating Samuel A. Alito as Supreme Court justice is here. That page has a link to streaming video, or you can download a .wav file here.
How many words are there in the transcript of Bush's nomination speech? How many words are there in the transcript of Alito's subsequent remarks? (Note that if you cut-and-paste the text into a program like Microsoft Word, you can use the built-in "word count" function (available in MSWord under the "tools" menu) to do the word count for you.)
(2) How many total seconds does Bush's speech take? How about Alito's remarks?
You should use a program for audio display and playback in order to make accurate measurements. Two relatively easy (and free) programs that you can use to do this are WaveSurfer and Audacity. You can also use the free Transcriber program, which makes is easy to enter a time-aligned transcription.
(3) Based on the total number of words and the total elapsed time, what is the overall speech rate (measured in Words Per Minute = WPM) for each speaker?
(4) On June 2, 2005, President Bush visited Hopkinsville, KY, to promote his plan for revising social security. The transcript of the session is here. Again, streaming video is available via a link on the whitehouse.gov page. A short audio excerpt is available as a .wav file here, corresponding to the stretch in the transcript from "The pay-as-you-go system is -- really isn't fair" to "That's not very far down the road."
How many words are there in this passage? How long is the corresponding audio? What is the speech rate in WPM for this stretch?
(5) Go back to Bush's nomination speech for Alito, and prepare a version of it that notes the duration of silent pauses between spoken phrases. You can use whatever format you prefer, but the content should be something like I've exemplified for the first few sentences of his speech:
You can use any program that allows you to make careful audio time measurements, but for this purpose I recommend Transcriber.
If you subtract the duration of the silent pauses from the elapsed time, what is the effective speech rate during what is left (i.e. the actually vocalized phrases of the speech)?
(6) Separate the pauses between sentences from the pauses within sentences (using the sentence divisions given by the official transcript). Plot a histogram (or other summary plot) of the within-sentence vs. across-sentence pauses. (You can create such plots using Microsoft Exel, or the free-software statistics program R (where the function is called hist()), or nearly any other statistical software package; or you can do it by hand.)
(7) Go to LDC Online, and sign up for a guest account. Once you are able to login as a guest, go to "LDC Corpus Search". Select "English Conversations" from the menu on the top left (which start out reading "English News Text". Set the "results view" to "tabular".
Now try the search string
The summary line should read:
If you try the search string
the results summary should be
This has searched the Switchboard corpus of around 2,400 telephone conversations.
A few facts about this corpus and the search system:
Now search this corpus for the following, distinguishing male and female counts:
Briefly interpret what you find in terms of the ideas about language and gender sketched in the course lecture notes.
Extra Credit (do any subset of these or similar things):