Minimizing Storage for Audio Files
Audio takes up a fair amount of space. As a result, it is common to
use lossy compression techniques that are undesirable or unacceptable for
linguistic research. Here are some ways of saving space without
compromising the quality of your data.
- Record Mono
- Since the audio market is dominated by music, many devices and
programs default to producing stereo recordings. Stereo is usually not
necessary for linguistic purposes, so you can save space by recording
only monaurally. If you use a device that produces only stereo,
as some do, there are programs such as sox that will convert a stereo signal to
a monaural signal.
-
- Use a Lower Sampling Rate
- Many devices record by default at the CD sampling rate of 44,100
samples per second. This corresponds to a frequency range of about 20,000 Hz.
Most adults have limited or no hearing in the upper part of this range.
In any case, all of the linguistic information in speech is located
below 8,000 Hz. A sampling rate of around 20,000 samples per second
is therefore more than adequate for linguistic purposes.
A sampling rate of 22,050 samples per second, half the CD rate,
is available on most digital recorders and digitizers.
If your digitizer only produces data at the CD rate or a higher rate,
you can use software such as
sox
to reduce the sampling rate.
-
- Use Lossless Compression
- Lossless compression techniques do exist. They don't compress the
audio data to the same extent as the lossy techniques do, but they
can still save you a considerable amount of space.
FLAC
is a free, open-source lossless compression program that runs on
most operating systems. Using maximum compression (level 8),
you can expect compression ratios ranging from 0.27 to 0.50.
The best compression is obtained with files containing a lot
of silence, such as interviews with pauses between turns.
The least compression is obtained with continuous fluent speech.
Further information on FLAC and other lossless compression techniques
may be found here
-
- Edit Out Long Pauses
- Field recordings often contain long pauses. You may be able to
save a worthwhile amount of space by editing these out. Many
recordings have a long silent stretch at the beginning, due to
turning on the recorder before things really get going, or at the end,
while the investigator checks whether he or she has any more questions to
ask. When digitizing tape recordings, if you leave the digitizer running
unattended, there is a good chance that the recording will end before you
notice it. This can result in a long stretch of silence at the end of the
sound file. Chopping this off can save considerable storage.
If your recording is not too noisy, there may not be much point in editing
out long pauses if you use FLAC compression as suggested above since
FLAC compresses silent intervals down to almost nothing.
-