Minimizing Storage for Audio Files

Last updated February 2007.

Audio takes up a fair amount of space. As a result, it is common to use lossy compression techniques that are undesirable or unacceptable for linguistic research. Here are some ways of saving space without compromising the quality of your data.

Record Mono
Since the audio market is dominated by music, many devices and programs default to producing stereo recordings. Stereo is usually not necessary for linguistic purposes, so you can save space by recording only monaurally. If you use a device that produces only stereo, as some do, there are programs such as sox that will convert a stereo signal to a monaural signal.
Use a Lower Sampling Rate
Many devices record by default at the CD sampling rate of 44,100 samples per second. This corresponds to a frequency range of about 20,000 Hz. Most adults have limited or no hearing in the upper part of this range. In any case, all of the linguistic information in speech is located below 8,000 Hz. A sampling rate of around 20,000 samples per second is therefore more than adequate for linguistic purposes. A sampling rate of 22,050 samples per second, half the CD rate, is available on most digital recorders and digitizers. If your digitizer only produces data at the CD rate or a higher rate, you can use software such as sox to reduce the sampling rate.
Use Lossless Compression
Lossless compression techniques do exist. They don't compress the audio data to the same extent as the lossy techniques do, but they can still save you a considerable amount of space. FLAC is a free, open-source lossless compression program that runs on most operating systems. Using maximum compression (level 8), you can expect compression ratios ranging from 0.27 to 0.50. The best compression is obtained with files containing a lot of silence, such as interviews with pauses between turns. The least compression is obtained with continuous fluent speech. Further information on FLAC and other lossless compression techniques may be found here
Edit Out Long Pauses
Field recordings often contain long pauses. You may be able to save a worthwhile amount of space by editing these out. Many recordings have a long silent stretch at the beginning, due to turning on the recorder before things really get going, or at the end, while the investigator checks whether he or she has any more questions to ask. When digitizing tape recordings, if you leave the digitizer running unattended, there is a good chance that the recording will end before you notice it. This can result in a long stretch of silence at the end of the sound file. Chopping this off can save considerable storage. If your recording is not too noisy, there may not be much point in editing out long pauses if you use FLAC compression as suggested above since FLAC compresses silent intervals down to almost nothing.
Last Modified: 16 Jun 2009
Phonetics Laboratory
Department of Linguistics
623 Williams Hall (campus map)
University of Pennsylvania
Philadelphia, PA 19104-6305
Telephone: (215) 898-0083
Fax: (215) 573-2091
For more information, contact Amy Forsyth at