LING525 - Conversions Cheat Sheet

LING525: Conversions Cheat Sheet

Several of the areas this course explores rely on various and sometimes tricky mathematical and computational formulations. The goal of this page is to exemplify some important cases -- for sinusoids, frequency analysis, and linear filters -- in a simple clear way, using Matlab/Octave code. See the course lecture notes and links for explanations, more background, and more examples.

Sinusoids:

The basic form, for a sine wave:

N=2048; x=2*pi*(0:(N-1))/N;
Amplitude=5; Frequency=10; Phase=0.25pi;

y1=Amplitude*sin(Frequency*x+Phase);

The sine- and cosine-weight form for a sine wave:

SinWeight = Amplitude*cos(Phase);
CosWeight = Amplitude*sin(Phase);
y2 = SinWeight*sin(Frequency*x) + CosWeight*cos(Frequency*x);

The complex exponential form for a sine wave:

y3 = imag(Amplitude*exp(i*(Frequency*x+Phase)));

plot(x,y1,'r-',x,y2,'bo',x,y3,'g-');

The basic form, for a cosine wave:

z1=Amplitude*cos(Frequency*x+Phase);

The sine- and cosine-weight form for a cosine wave:

SinWeight1 = -Amplitude*sin(Phase);
CosWeight1 = Amplitude*cos(Phase);
z2 = SinWeight1*sin(Frequency*x) + CosWeight1*cos(Frequency*x);

The complex exponential form for a cosine wave:

z3 = real(Amplitude*exp(i*(Frequency*x-Phase)));

Plot all three:

plot(x,z1,'r-',x,z2,'bo',x,z3,'g-');

Discrete Fourier Transform:

N=32;
x = 2*pi*(0:(N-1))/N;
Amplitude=2; Frequency=3; Phase=0.5*pi;
y = Amplitude*sin(Frequency*x+Phase);
Y = fft(y);
round(Y)
ans =
 Columns 1 through 16:
    0    0    0   32    0    0    0    0    0    0    0    0    0    0    0    0
 Columns 17 through 32:
    0    0    0    0    0    0    0    0    0    0    0    0    0   32    0    0

Note that the Frequency of the 32-element input was 3, but the non-zero fft coefficients are in positions numbered 4 and 30.

Why is that?

The frequencies of the first 16 fft outputs count up from 0 to (N/2-1)

0:(N/2-1)
ans =
    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15

And the second 16 fft outputs count from -16 up to -1, which is why number 30 (3rd from the end) is also non-zero for an input containing energy only at frequency 3. But for real-valued inputs, you can ignore the second half of the fft outputs -- in this case, the 32 degrees of freedom of the input are taken up by 16 real and 16 imaginary values of the first 16 complex coefficients.

Note that for fft() analysis of sampled signals, these ordinal frequencies need to be taken as fractions of half the fft() order, multiplied by the "Nyquist rate", which is half the sampling frequency. So for a 512-point fft() of some audio sampled at 16-kHz, the frequencies of the (first 256) fft() outputs will be

NFFT=512; FS=16000;
NFFT2=NFFT/2; FS2=FS/2;  
Frequencies = ((0:(NFFT2-1))/NFFT2)*FS2;
Frequencies(1:5)
ans =
     0    31.2500    62.5000    93.7500   125.0000
 Frequencies(252:256)
ans =
   7843.8   7875.0   7906.2   7937.5   7968.8

Also, since the input amplitude (in our example above) was 2, why are the non-zero fft values 32?

Let's try amplitude 3:

Amplitude=3;
y1 = Amplitude*sin(Frequency*x+Phase);
Y1 = fft(y1);
round(Y1)
ans =
 Columns 1 through 16:
    0    0    0   48    0    0    0    0    0    0    0    0    0    0    0    0
 Columns 17 through 32:
    0    0    0    0    0    0    0    0    0    0    0    0    0   48    0    0

So the fft amplitudes are the input amplitudes multiplied by N/2 -- 16*2=32, 16*3=48.

And what good is the frequency 0 fft() value? It registers an additive factor in the input, e.g.

Offset=2;
z1 = Offset + Amplitude*sin(Frequency*x+Phase);
Z1 = fft(z1);
round(Z2)
ans =
 Columns 1 through 16:
   64    0    0   32    0    0    0    0    0    0    0    0    0    0    0    0
 Columns 17 through 32:
    0    0    0    0    0    0    0    0    0    0    0    0    0   32    0    0

Next, where are all the imaginary parts of the complex fft() coefficients? We started with Phase=0.5*pi, which make the imaginary components 0. Let's try another phrase:

Phase = .6*pi;
y2 = Offset + Amplitude*sin(Frequency*x+Phase);
Y2 = fft(y2);
round(Y2)
ans =
 Columns 1 through 6:
   64 +  0i    0 +  0i    0 -  0i   30 + 10i    0 -  0i    0 -  0i
 Columns 7 through 12:
    0 -  0i    0 -  0i    0 +  0i    0 +  0i    0 +  0i    0 +  0i
 Columns 13 through 18:
    0 -  0i    0 +  0i    0 +  0i    0 -  0i    0 +  0i    0 +  0i
 Columns 19 through 24:
    0 -  0i    0 -  0i    0 +  0i    0 -  0i    0 -  0i    0 -  0i
 Columns 25 through 30:
    0 -  0i    0 +  0i    0 +  0i    0 +  0i    0 +  0i   30 - 10i
 Columns 31 and 32:
    0 +  0i    0 -  0i

To get the amplitude as before, we take the absolute value of the complex coefficients:

round(abs(Y2))
ans =
 Columns 1 through 16:
   64    0    0   32    0    0    0    0    0    0    0    0    0    0    0    0
 Columns 17 through 32:
    0    0    0    0    0    0    0    0    0    0    0    0    0   32    0    0

And if we want the phase (which in speech research, we mostly don't), we use angle() -- and in this case, add pi/2 to get the sin() phase rather than the cos() phase:

angle(Y2(4))
ans = 0.3142

angle(Y2(4))+pi/2
ans = 1.8850

0.6*pi % what we set Phase to...
ans = 1.8850

When we use fft() to get the spectrum of a piece of audio -- or to analyze a sequence of acoustic frames for a spectrogram -- we normally choose the audio duration based on the kind of spectrum we want, multiply by a windowing function to avoid edge-artefacts, use zero-padding to extend the window to a power of 2, and then process the result to create a log power spectrum in decibels.

For example, if we want to look at (part of) the second syllable in this clip in a way that shows the overtone series, since the pitch is around 300 Hz (= period of .003 seconds) we'll want at least three periods. So we might do something like this:

NFFT=2048; NFFT2=NFFT/2;
[X,FS] = audioread("SA2.wav"); FS2 = FS/2;
Frequencies = (0:(NFFT2-1))*FS2/NFFT2;
NWindow = round(0.01*FS); % 10 msec window = 160 samples at 16 kHz
StartSample = round(1.04*FS); % part way through "ask"...
x1 = X(StartSample:(StartSample+NWindow-1));
x1 = hamming(NWindow).*x1;
xx1 = zeros(NFFT,1); xx1(1:NWindow) = x1;
yy1 = fft(xx1);
yy1 = 20*log10(abs(yy1(1:NFFT2)));
yy1 = yy1 - max(yy1);
plot(Frequencies,yy1,'r-'); axis("tight");
xlabel("Frequency (Hz)"); ylabel("Power (dB)"); title("Spectrum Example");

...resulting in this picture:

Since "bel" is a logarithmic (log10()) unit for measuring relationships (like ratios of powers), and a "decibel" is a tenth of a bel, you might well wonder, why do we multiply log10(abs(yy1)) by 20 to get the power spectrum in dB, rather than by 10? Basically it's because the physicists say we should -- but for more information, see here.

Poles and Zeros

Just as there are several difference (but equivalent) ways to define a sinusoid, there are several different (but equivalent) way to to define a "pole" (i.e. one of the resonances of an IIR = all-pole filter) or a "zero" (the analogous components of an FIR = all-zero filter).

In each case there are two degrees of freedom: Real/Imaginary, Frequency/Amplitude, Frequency/Bandwidth. Let's start with Amplitudes and Frequencies of 3 illustrative poles, and graph the corresponding complex roots:

nfft=512; nfft2 = nfft/2;
% amplitudes and frequencies (in radians):
p1 = [0.98 0.5];  p2 = [0.92 0.5]; p3 = [0.82 0.5];
% The corresponding complex roots:
r1=p1(1)*exp(i*p1(2)*pi); r2=p2(1)*exp(i*p2(2)*pi); r3= p3(1)*exp(i*p3(2)*pi);  
% Their locations on the unit circle:
drawCircle(0,0,1); hold on
axis([-1.3 1.3 -1.3 1.3],"square");
xlabel("Real"); ylabel("Imaginary");
plot([-1.3 1.3],[0 0],'k:',"linewidth",0.25);
plot([0 0],[-1.3 1.3],'k:',"linewidth",0.25);
plot(real(r1),imag(r1),'ro',real(r2),imag(r2),'bx',real(r3),imag(r3),'g*');
title("3 Poles: Freq 0.5 radians, Amp 0.98, 0.92, 0.82"); hold off

(Note that the real parts of the complex roots are essentially 0 for pole frequency 0.5*pi...)

Now let's plot the frequency responses:

impulse = zeros(nfft,1); impulse(15)=1;
Freqs = (0:(nfft2-1))*(1/(nfft2));
Acoef1 = poly([r1 r1']); Acoef2 = poly([r2 r2']); Acoef3 = poly([r3 r3']);
impresp1 = filter([1 0 0],Acoef1,impulse);
impresp2 = filter([1 0 0],Acoef2,impulse);
impresp3 = filter([1 0 0],Acoef3,impulse);
lspec1 = 20*log10(abs(fft(impresp1)(1:nfft2)));
lspec2 = 20*log10(abs(fft(impresp2)(1:nfft2)));
lspec3 = 20*log10(abs(fft(impresp3)(1:nfft2)));
%
plot(Freqs,lspec1,'r:',Freqs,lspec2,'b:',Freqs,lspec3,'g:');
hold on; plot([0.5 0.5],[-20 40],'k-');
title('Frequency responses for the 3 poles');
xlabel('Frequency (Radians)'); ylabel('20*log10(Amplitude)');

A pole's "half-power bandwidth" is the width of the region around the center frequency containing half the power of the transfer function -- otherwise known as the "3 dB down point". We could determine that empirically by inspecting the transfer function (in dB), or more directly using the formula

B = -log(amplitude)/(pi/FS)

Obviously the "bandwidth" (in frequency units) depends on the sampling frequency as well as the pole's amplitude, so for our amplitude=0.98 and amplitude=0.82 poles, some values at different sample rates would be

-log(0.98)/(pi/8000)  % Frequency range 0-8000 Hz
 ans = 51.446
-log(0.98)/(pi/48000) % Frequency range 0 24000 Hz
 ans = 308.67

-log(0.82)/(pi/8000)
 ans = 505.35
-log(0.82)/(pi/48000)
 ans = 3032.1

Let's try a different center frequency, with the same amplitudes (=bandwidths):

nfft=512; nfft2 = nfft/2;
% amplitudes and frequencies (in radians):
p1 = [0.98 0.3];  p2 = [0.92 0.3]; p3 = [0.82 0.3];
% The corresponding complex roots:
r1=p1(1)*exp(i*p1(2)*pi); r2=p2(1)*exp(i*p2(2)*pi); r3= p3(1)*exp(i*p3(2)*pi);
% The impulse response
impulse = zeros(nfft,1); impulse(15)=1;
Freqs = (0:(nfft2-1))*(1/(nfft2));
Acoef1 = poly([r1 r1']); Acoef2 = poly([r2 r2']); Acoef3 = poly([r3 r3']);
impresp1 = filter([1 0 0],Acoef1,impulse);
impresp2 = filter([1 0 0],Acoef2,impulse);
impresp3 = filter([1 0 0],Acoef3,impulse);
lspec1 = 20*log10(abs(fft(impresp1)(1:nfft2)));
lspec2 = 20*log10(abs(fft(impresp2)(1:nfft2)));
lspec3 = 20*log10(abs(fft(impresp3)(1:nfft2)));
%
plot(Freqs,lspec1,'r:',Freqs,lspec2,'b:',Freqs,lspec3,'g:');
hold on; plot([0.3 0.3],[-20 40],'k-'); hold off
title('Frequency responses for the 3 poles');
xlabel('Frequency (Radians)'); ylabel('20*log10(Amplitude)');

Again, note the multiple ways to express the same (here two) degrees of freedom:

Amplitude / Frequency ("p1"): 0.98 0.3 Complex Root ("r1"): 0.5760 + 0.7928i IIR filter coefficients ("Acoef1"): 1.0000 -1.1521 0.9604

More later ...