# Impulse Response and Convolution

Digital signal processing is applied linear algebra? This is easy to grasp for color matching, where we have fixed dimensions of 1 (number of test lights), 3 (number of primary lights, number of photopigments), and 31 (number of sample points in a spectral power distribution for a light, or in the spectral absorption for a pigment). But how can we make this true for audio or video signals?

Consider a simple case. The CD standard samples an audio waveform 44,100 times a second, so that a piece lasting 2:48 contains 7,408,800 samples (ignoring the stereo issue). Since any equalization function (as we'll show later) is linear, such a function operating on one channel of our little piece of music can be modeled as a 7,408,800 by 7,408,800 matrix. "All" we have to do is to multiply our 7,408,800-element column vector by this matrix, producing another column vector with the same number of elements -- and this will be our equalized bit of audio. If we wanted to operate on a half-hour recording, the scale of the operation would go up in proportion.

This does not seem like a very practical technique. It's conceptually correct, and sometimes it may be useful to think of things in this way. However, this is (needless to say) not how a DSP implementation of an equalizer is accomplished. There are much easier ways, which are mathematically equivalent for systems with certain properties, whose matrices have corresponding properties that permit simple and efficient implementation of the equivalent calculation.

This topic can be reduced to a slogan:

The effect of any linear, shift-invariant system on an arbitrary input signal is obtained by convolving the input signal with the response of the system to a unit impulse.

To get an idea of what this might be good for, consider some things in the real world that are (or at least can be successfully modeled as) linear shift-invariant systems:

Input System Output
laryngeal buzz vocal tract resonances vowel sound
sound physical properties of the
peripheral auditory system
motion of the basilar membrane
at a particular place
noisy recording noise-rejection filter less noisy recording
signal graphic equalizer modified signal
signal Dolby encoding Dolby-encoded signal
Dolby-encoded signal Dolby decoding original signal
sound source room acoustics reverberant sound

Once you understand the terminology in this slogan, it will be almost immediately obvious that it's true; so in a sense this lecture is mostly a matter of learning some definitions! We already know what a linear system is. A shift-invariant system is one where shifting the input always shifts the output by the same amount. When we're representing signals by vectors, then a shift means a constant integer added to all indices. Thus shifting vector v by n samples produces a vector w such that w(i+n) = v(i).

Note: there is a little problem here is decided what happens at the edges. Thus for a positive shift n the first element of w should correspond to the minus nth element of v -- but v isn't defined for indices smaller than 1 (or zero, if we decide to start there). There's a similar problem at the other end. Conventional DSP mathematics solves this problem by treating signals as having infinite extent -- defined for all indices from minus infinity to infinity. Real-world signals generally start and stop, however. This is a question we'll return to several times, including once at the end of this lecture, when we'll provide a slightly more formal account in both the EE/DSP perspective and the linear algebra perspective.

For signals that are functions of time -- i.e. where the succession of indices corresponds to a sequence of time points -- a shift-invariant system can equivalently be called a time-invariant system. Here the property of shift-invariance has a particularly intuitive meaning. Suppose we probe some acoustic resonator with a particular input at 12:00 noon on January 25, 1999, and get a response (whatever it is), which we record. Then we probe the same system again with the same input, at 12:00 noon on January 26, 1999. We expect to record the same output -- just shifted forward in time by 24 hours! The same expectation would apply for a time difference of one hour, or one minute. Finally, if we hypothetically delay the input by 1 millisecond, we expect the output to be delayed by the same amount -- and to be otherwise unchanged! The resonator doesn't know what time it is, and responds in the same way regardless of when it is probed.

A unit impulse (for present purposes) is just a vector whose first element is 1, and all of whose other elements are 0. (For the electrical engineer's digital signals of infinite extent, the unit impulse is 1 for index 0 and 0 for all other indices, from minus infinity to infinity).

We'll work up to what convolution is by giving a simple example. Here's a graph of 50 sample (about 6 milliseconds) of a speech waveform.

We're representing this waveform as a sequence of numbers -- a vector -- and from this perspective a more suitable graphical representation of the same data is a "lollipop plot", which shows us each sample as a little "lollipop" sticking up or down from a zero line:

Let's zoom in on just the first six of these numbers:

Octave (or Matlab) will tell us their specific values:

>> S(1000:1005)
ans =
10622
5624
614
1280
-3363
7694


We can think of this six-element vector s as being the sum of six other vectors s1 to s6, each of which "carries" just one of its values, with all the other values being zero:

>> [10622 0    0   0    0     0    ] + ...
[0     5624 0   0    0     0    ] + ...
[0     0    614 0    0     0    ] + ...
[0     0    0   1280 0     0    ] + ...
[0     0    0   0    -3363 0    ] + ...
[0     0    0   0    0     7694 ]
ans =
10622        5624         614        1280       -3363        7694


Recall that an impulse (in the current context, anyhow) is a vector whose first element has the value 1 and all of whose subsequent elements are zero. The vector we've called s1 is an impulse multiplied by 10622. The vector s2 is an impulse shifted to the right by one element and scaled by 5624. Thus we are decomposing s into a set of scaled and shifted impulses. It should be clear that we can do this to an arbitrary vector.

The same decomposition represented graphically:

Why is this interesting? Well, consider some arbitrary shift-invariant linear system D. Suppose that we apply D (without knowing anything more about it) to an impulse, with the result shown below: the first sample of the output is 1, the second sample is -1, and the rest of the samples are 0. This result is the impulse response of D.

This is enough to predict the result of applying D to our scaled and shifted impulses, s1 ... sn. Because D is shift-invariant, the effect of shifting the input is just to shift the output by the same amount. Thus an input consisting of a unit impulse shifted by any arbitrary amount will produce a copy of the impulse response, shifted by that same amount.

We also know that D is linear, and therefore a scaled impulse as input will produce a scaled copy of the impulse response as output.

Using these two facts, we can predict the response of D to each of the scaled and shifted impulses s1 ... sn. This is shown graphically below:

If we arrange the responses to s1 .. s6 as the rows of matrix, the actual numbers will look like this:
(The arrangement of these outputs as the rows of a matrix is purely for typographical convenience; also notice that we've allow the response to input s6 to fall off the end of the world, so to speak)

>> Qout
Qout =
10622      -10622           0           0           0           0
0        5624       -5624           0           0           0
0           0         614        -614           0           0
0           0           0        1280       -1280           0
0           0           0           0       -3363        3363
0           0           0           0           0        7694


This information, in turn, is enough to let us predict the response of the system D to the original vector s, which (by construction) is just the sum of s1+s2+s3+s4+s5+s6. Since D is linear, applying it to this sum is the same as applying it to the individual components of the sum, and the adding up the results. This is just the sum of the columns of the matrix shown above:

>> sum(Qout)
ans =
10622       -4998       -5010         666       -4643       11057


(Matlab "sum", applied to a matrix, produces a row vector of the sums of the columns...)

Notice that (at least for the second position in the sum and onward) this makes the output in position i equal to the difference between the input in position i and the input in position i-1. In other words, D happens to be calculating the first difference of its input.

It should be clear that the same basic procedure will work for ANY shift-invariant linear system, and for ANY input to such a system:

• express the input as a sum of scaled and shifted impulses;
• calculate the response to each of these by scaling and shifting the system's impulse response;
• add up the resulting set of scaled and shifted impulse responses.

This process of adding up a set of scaled and shifted copies of one vector (here the impulse response), using the values of another vector (here the input) as the scaling values, is convolution -- at least this is one way to define it.

Another way: the convolution of two vectors a and b is defined as a vector c, whose kth element is (in MATLAB-ish terms)

c(k) = sum_over_j ( a(j) * b(k+1-j) )       Equation 1

(The "+1" in "k+1-j" is due to the fact that MATLAB indices have the bad taste to start from 1 instead of the mathematically more elegant 0).

This formulation helps indicate that we can also think of convolution as a process of taking a running weighted average of a sequence -- that is, each element of the output vector is a linear combination of some of the elements of one of the input vectors-- where the weights are taken from the other input vector.

There are a couple of little problems: how long should c be? and what should we do if k+1-j is negative or greater than the length of b?

These problems are a version of the "edge effects" we've already hinted at, and will see again. One possible solution is to imagine that we are convolving two infinite sequences created by embedding a and b in an ocean of zeros. Now arbitrary index values---negative ones, ones that seemed "too big"---make perfect sense. The value of extended a and extended b for index values outside their actual range is now perfectly well defined: always zero. The result of Equation 1 will be another infinite-length sequence c.

A little thought will convince you that most of c will also be necessarily zero, since the non-zero weights from b and the non-zero elements of a will not coincide in those cases. How many elements of c have a chance to be non-zero? Well, just those integers k for which there is at least one integer j such that 1 <= j <= length(a) and 1 <= k+1-j <= length(b). With a little more thought, you can see that this means that the length of c will be one less than the sum of the lengths of a and b.

Referring again to Equation 1, and imagining the two vectors a and b as embedded in their seas of zeros, we can see that we will get the right answer if we allow k to run from 1 to length(a)+length(b)-1, and for each value of k, allow j to run from max(1,k+1-length(b)) to min(k,length(a)). Again, all of this is in MATLAB index terms, and so we can transfer it directly to a MATLAB program myconv() to perform convolution:


function c = myconv(a,b)
c = zeros(1,length(a)+length(b)-1);
for k = 1:length(a)+length(b)-1
for j = max(1,k+1-length(b)):min(k,length(a))
c(k) = c(k) + a(j)*b(k+1-j);
end
end


This will give us just the piece of the conceptually infinite c that has a chance to be non-zero.

MATLAB has a built-in convolution function conv(), so we can compare the one that we just wrote:

>> a=[3 4 5];
>> b=[6 7 8];
>> myconv(a,b)
ans =
18    45    82    67    40
>> conv(a,b)
ans =
18    45    82    67    40


As an aside, we should mention that convolution will also give us the correct results if we think of a, b and c as the coefficients of polynomials, with c being the coefficients of the polynomial resulting from multiplying a and b together. Thus convolution is isomorphic to polynomial multiplication, so that e.g.

>> conv([2 3], [4 5])
ans =
8    22    15


can also be interpreted to mean that $$(2x+3)(4x+5) = 8x^2 + 22x + 15$$

and

>> conv([3 4],[5 6 7])
ans =
15    38    45    28

can also be interpreted to mean that $$(3x+4)(5x^2+6x+7) = 15x^3 + 38x^2 + 45x + 28$$

If you believe this, it follows immediately from the commutativity of multiplication that convolution also commutes (and is associative, and distributes over addition).

We can exemplify these properties empirically:

>> conv( [3 4], [5 6 7] )
ans =
15    38    45    28
>> conv( [5 6 7], [3 4])
ans =
15    38    45    28

These are important points, so if you do not immediately see that they arealways true, spend some time with Equation 1 -- or with the convolution operator in Matlab -- and convince yourself.

We've given two pictures of conv(a,b):

• in one, we add up a bunch of scaled and shifted copies of a, each copy scaled by one value of b, and shifted over to align with the location of that value in b.
• in the other, we use take a running weighted average of a, taking b (backwards) as the weights.

We can see the relationship between these two pictures by expressing Equation 1 in matrix form. We have been thinking of b as the impulse response of the system, a as the input, and c as the output. This implies that the matrix for S will have dimensions length(c) by length(a), if c = Sa is to be legal matix-ese.

Each element of the output c will be the inner product of a row of S with the input a. This will be exactly Equation 1 if the row of S in question is just b, time-reversed, shifted, and suitably padded with zeros. As b shifts out of the picture, we just shift in zeros from the "sea of zeros" we imagine ourselves to be floating in.

A small modification of our convolution program will produce the needed matrix:


function c = cmat(a,b)
% turn a convolution mask into a matrix operator for pedagogical purposes
c = zeros(length(a)+length(b)-1,length(a));
for k = 1:length(a)+length(b)-1
for j = max(1,k+1-length(b)):min(k,length(a))
c(k,j) = b(k+1-j);
end
end


Trying it out:

>> a=[3 4];
>> b=[5 6 7];
>> cmat(a,b)
ans =
5     0
6     5
7     6
0     7


Thus cmat(a,b) creates a matrix operator C that can be multiplied by the vector a to create exactly the same effect as convolution of a with b:

>> a = [3 4]; b = [5 6 7];
>> C = cmat(a,b)
C =
5     0
6     5
7     6
0     7
>> conv(a,b)
ans =
15    38    45    28
>> (C*a')'
ans =
15    38    45    28


This works because the rows of C are suitably shifted (backwards-running) copies of b -- or equivalently, because the columns of C are suitably shifted (forwards-running) copies of b.

This gives us the two pictures of convolutional operators:

• THE RUNNING WEIGHTED AVERAGE OF THE INPUT: The rows of C are shifted backwards copies of b, and the inner product of each row with a will give us a weighted average of a suitable piece of a, which we stick into the appropriate place in the output c.
• THE SUM OF SCALED AND SHIFTED COPIES OF THE IMPULSE RESPONSE: The columns of C are shifted copies of b. Taking the other view of matrix multiplication, namely that the output is the sum of the columns of C weighted by the elements of a, gives us the other picture of convolution, namely adding up a set of scaled and shifted copies of the "impulse response" b.

A larger example:

>> a = [1 2 3 4 5]; b = [6 7 8];
>>
>> C = cmat(a,b)
C =
6     0     0     0     0
7     6     0     0     0
8     7     6     0     0
0     8     7     6     0
0     0     8     7     6
0     0     0     8     7
0     0     0     0     8
>>
>> conv(a,b)
ans =
6    19    40    61    82    67    40
>>
>> (C*a')'
ans =
6    19    40    61    82    67    40


In working through the details of convolution, we had to deal with the "edge effect": the fact that the convolution equation (Equation 1) implies index values for finite-length inputs a and b outside the range in which they are defined.

Obviously we could choose quite a number of different ways to supply the missing values---the particular choice that we make should depend on what we are doing. There are some cases in which the "sea of zeros" concept is exactly correct.

However, there are alternative situations in which other ideas make more sense. For instance, we might think of b as sitting in a sea of infinitely many repeated copies of itself. Since this means that index values "off the end" of b wrap around to the other end in a modular fashion, just as if b was on a circle, the kind of convolution that results is called "circular convolution."

Keep this in mind: we will come back to it in a later lecture.

Meanwhile, let's repeat the slogan we began with:

The effect of any linear, shift-invariant system on an arbitrary input signal is obtained by convolving the input signal with the response of the system to a unit impulse.

(Notice that this is the same property of linear systems that we observed in the case of color matching -- where we could learn everything we needed to know about the system by probing it with a limited set of "monochromatic" inputs. If the system were only linear, and not shift-invariant, the analogy here would be to probe it with unit impulses at every possible index value -- each such probe giving us one column of the system matrix. That was practical with a 31-element vector, but it would be less attractive with vectors of millions or billions of elements! However, if the system is also shift-invariant, a probe with just one impulse is enough, since the responses of all the shifted cases can be predicted from it.)

Convolution can always be seen as matrix multiplication -- this has to be true, because a system that can be implemented by convolution is a linear system (as well as being shift-invariant). Shift-invariance means that the system matrix has particular redundancies, though.

When the impulse response is of finite duration, this slogan is not only mathematically true, but also is often quite a practical way to implement the system, because we can implement the convolution in a fixed number of multiply-adds per input sample (exactly as many as there are non-zero values in to the system's impulse response). Systems of this type are generally called "finite impulse response" (FIR) filters, or equivalently "moving average" filters. When the impulse response is of infinite duration (as it perfectly well can be in a linear shift-invariant system!), then this slogan remains mathematically true, but is of less practical value (unless the impulse response can be truncated without significant effect). We'll learn later how to implement "infinite impulse response" (IIR) filters efficiently.

The EE/DSP perspective.

The goal of this section is to develop the basic material on impulse response and convolution in the style that is common in the digital signal processing literature in the discipline of Electrical Engineering, so as to help you become familiar with the type of notation that you are likely to encounter there. Also, perhaps going over the same ideas again in a different notation will help you to assimilate thm -- but be careful to keep the DSP/EE notation separate in your mind from linear algebra notation, or you will become very confused!

In this perspective, we treat a digital signal s as an infinitely-long sequence of numbers. We can adapt the mathematical fiction of infinity to everyday finite reality by assuming that all signal values are zero outside of some finite-length sub-sequence. The positions in one of these infinitely-long sequences of numbers are indexed by integers, so that we take s(n) to mean the nth number in sequence s,'' usually called s of n'' for short. Sometimes we will alternatively use s(n) to refer to the entire sequence s, by thinking of n as a free variable.

We will let an index like n range over negative as well as positive integers, and also zero. Thus

where the curly braces are a notation meaning set,'' so that the whole expression means the set of numbers s(n) where n takes on all values from minus infinity to infinity.''

We will refer to the individual numbers in a sequence s as elements or samples. The word sample comes from the fact that we usually think of these sequences as discretely-sampled versions of continuous functions, such as the result of sampling an acoustic waveform some finite number of times a second, but in fact nothing that is presented in this section depends on a sequence being anything other than an ordered set of numbers.

The unit impulse or unit sample sequence, written , is a sequence that is one at sample point zero, and zero everywhere else:

The Greek capital sigma, , pronounced sum, is used as a notation for adding up a set of numbers, typically by having some variable take on a specified set of values. Thus

is shorthand for

1 + 2 + 3 + 4 + 5

and

is shorthand for

f(x0) + f(x1) + f(x2) + f(x3).

The notation is particularly helpful in dealing with sums over sequences, in the sense of sequence used in this section, as in the following simple example. The unit step sequence, written u(n), is a sequence that is zero at all sample points less than zero, and 1 at all sample points greater than or equal to zero:

The unit step sequence can also be obtained as a cumulative sum of the unit impulse:

Up to n = -1 the sum will be 0, since all the values of for negative n are 0; at n=0 the cumulative sum jumps to 1, since ; and the cumulative sum stays at 1 for all values of n greater than , since all the rest of the values of are 0 again.

This is not a particularly impressive use of the notation, but it should help you to understand that it can be perfectly sensible to talk about infinite sums. Note that we can also express the relationship between u(n) and in the other direction:

In general, it is useful to talk about applying the ordinary operations of arithmetic to sequences. Thus we can write the product of sequences x and y as xy, meaning the sequence made up of the products of the corresponding elements (not the inner product):

Likewise the sum of sequences x and y can be written x+y, meaning

A sequence x can be multiplied by a scalar ,with the meaning that each element of x is individually so multiplied:

Finally, a sequence may be shifted by any integer number of sample points:

We already used this notation when we expressed the unit impulse sequence in terms of the unit step sequence, as the difference between a given sample and the immediately previous sample.

Any sequence can be expressed as a sum of scaled and shifted unit samples. Conceptually, this is trivial: we just make, for each sample of the original sequence, a new sequence whose sole non-zero member is that chosen sample, and we add up all these single-sample sequences to make up the original sequence. Each of these single-sample sequences (really, each sequence contains infinitely many samples, but only one of them is non-zero) can in turn be represented as a unit impulse (a sample of value 1 located at point ) scaled by the appropriate value and shifted to the appropriate place. In mathematical language, this is

 (1)
where k is a variable that picks out each of the original samples, uses its value to scale the unit impulse, and then shifts the result to the position of the selected sample.

A system or transform T maps an input sequence x(n) onto an output sequence y(n):

 y(n) = T[x(n)] (2)

Thus such a system or transform is a function from sequences to sequences.

Systems or transforms come in a wide variety of types. One important class is known as linear systems. The propagation of ound in air in linear, meaning that the principle of superposition applies, so that the pressure disturbance resulting from two sounds propagating through the same region is just the sum of the pressure disturbances corresponding to the individual sounds. Linearity means the same thing as applied to sequences. We can express it mathematically like this:

 (3)

(For real-valued sequences, homogeneity follows from superposition).

Now we replace the expression x(n) in 2 with the re-expression of x(n) found in 1 to obtain:

 (4)

That is, the sequence consisting of just sample k from x can be rewritten as , our old friend the unit impulse scaled to the value of x(k) and shifted to position k. The response of system T to this sequence is just

and by linearity we can pull the constant (for this single sample) multiplier x(k) outside of T, giving us
 (5)
as the response of T to a sequence whose only non-zero value is given by x(k).

The original sequence x can be re-expressed as the sum of all of its individual sample values; by linearity, this lets us express T[x] as the sum over all samples of the response of T to a sequence consisting of only the given sample; this response is specified for a particular sample k by equation 5, and so make up the reponse to the original input, we just need to sum equation 5 over all k, which is:

 (6)
Equation 6 has an extraordinary property--it represents the response of system T to an arbitrary input sequence x without applying T to the input x at all! The only thing T operates on is the set of shifted unit impulses, which is independent of x. Having once applied T to the shifted unit impulses, we can calculate T[x] for arbitrary x just by doing the multiplications and additions specified in equation 6.

However, there is one annoyance - we still must calculate for every shift k. This would be unnecessary if the response of T to a shifted input was just an output shifted by the same amount. This property, which is called shift invariance, holds of many interesting systems. For example, if we put a certain test signal into an acoustic resonator, and then put the same test signal into the same resonator 20 seconds later, we would expect the output to be the same, just shifted in time by the same twenty seconds as the input (as long as the resonator hasn't changed in the meantime).

In mathematical language, a system T is shift-invariant if and only if

 (7)
Thus if we write h(n) for the response of T to the unshifted unit impulse , then if T is shift-invariant, the response of T to a unit impulse shifted by k is just h(n-k):
 (8)
Thus if T is shift-invariant as well as linear, we can re-write equation 6 as
 (9)
Notice what this means: for any linear shift-invariant system T, once we have calculated its impulse reponse h(n) (its response to a unit impulse at sample point 0), we can forget about T entirely, and just add up scaled and shifted copies of h(n) to calculate the response of T to any input whatsoever. Thus any linear shift-invariant system is completely characterized by its impulse response h(n).

## Convolution again

The way of combining two sequences specified by equation 9 is our friend convolution. For any two sequences x and y, there will be another sequence w obtained by convolving x with y, following the equation

(where we're using '*' to symbolize the convolution operation...)

The following things are true for convolution in general, as you should be able to verify for yourself by some algebraic manipulation: