# Impulse Response and Convolution

Digital signal processing is applied linear algebra? This is easy to grasp for color matching, where we have fixed dimensions of 1 (number of test lights), 3 (number of primary lights, number of photopigments), and 31 (number of sample points in a spectral power distribution for a light, or in the spectral absorption for a pigment). But how can we make this true for audio or video signals?

Consider a simple case. The CD standard samples an audio waveform 44,100 times a second, so that a piece lasting 2:48 contains 7,408,800 samples (ignoring the stereo issue). Since any equalization function (as we'll show later) is linear, such a function operating on one channel of our little piece of music can be modeled as a 7,408,800 by 7,408,800 matrix. "All" we have to do is to multiply our 7,408,800-element column vector by this matrix, producing another column vector with the same number of elements -- and this will be our equalized bit of audio. If we wanted to operate on a half-hour recording, the scale of the operation would go up in proportion.

This does not seem like a very practical technique. It's conceptually correct, and sometimes it may be useful to think of things in this way. However, this is (needless to say) not how a DSP implementation of an equalizer is accomplished. There are much easier ways, which are mathematically equivalent for systems with certain properties, whose matrices have corresponding properties that permit simple and efficient implementation of the equivalent calculation.

This topic can be reduced to a slogan:

**The effect of any linear, shift-invariant system on
an arbitrary input signal is obtained by convolving the input signal with the
response of the system to a unit impulse.**

To get an idea of what this might be good for, consider some things in the real world that are (or at least can be successfully modeled as) linear shift-invariant systems:

Input |
System |
Output |
---|---|---|

laryngeal buzz | vocal tract resonances | vowel sound |

sound | physical properties of the peripheral auditory system |
motion of the basilar membrane at a particular place |

noisy recording | noise-rejection filter | less noisy recording |

signal | graphic equalizer | modified signal |

signal | Dolby encoding | Dolby-encoded signal |

Dolby-encoded signal | Dolby decoding | original signal |

sound source | room acoustics | reverberant sound |

Once you understand the terminology in this slogan, it will be almost immediately
obvious that it's true; so in a sense this lecture is mostly a matter of learning
some definitions! We already know what a **linear **system is. A **shift-invariant** system is one where shifting the input always shifts the output by the same
amount. When we're representing signals by vectors, then a shift means a constant
integer added to all indices. Thus shifting vector **v** by *n* samples
produces a vector** w** such that **w**(i+n) = **v**(i).

Note: there is a little problem here is decided what happens at the edges. Thus for a positive shift n the first element of w should correspond to the minus nth element of v -- but v isn't defined for indices smaller than 1 (or zero, if we decide to start there). There's a similar problem at the other end. Conventional DSP mathematics solves this problem by treating signals as having infinite extent -- defined for all indices from minus infinity to infinity. Real-world signals generally start and stop, however. This is a question we'll return to several times, including once at the end of this lecture, when we'll provide a slightly more formal account in both the EE/DSP perspective and the linear algebra perspective.

For signals that are functions of time -- i.e. where the succession of indices
corresponds to a sequence of time points -- a **shift-invariant** system
can equivalently be called a **time-invariant** system. Here the property
of shift-invariance has a particularly intuitive meaning. Suppose we probe some
acoustic resonator with a particular input at 12:00 noon on January 25, 1999,
and get a response (whatever it is), which we record. Then we probe the same
system again with the same input, at 12:00 noon on January 26, 1999. We expect
to record the same output -- just shifted forward in time by 24 hours! The same
expectation would apply for a time difference of one hour, or one minute. Finally,
if we hypothetically delay the input by 1 millisecond, we expect the output
to be delayed by the same amount -- and to be otherwise unchanged! The resonator
doesn't know what time it is, and responds in the same way regardless of when
it is probed.

A **unit** **impulse** (for present purposes) is just a vector whose
first element is 1, and all of whose other elements are 0. (For the electrical
engineer's digital signals of infinite extent, the unit impulse is 1 for index
0 and 0 for all other indices, from minus infinity to infinity).

We'll work up to what convolution is by giving a simple example. Here's a graph of 50 sample (about 6 milliseconds) of a speech waveform.

We're representing this waveform as a sequence of numbers -- a vector -- and from this perspective a more suitable graphical representation of the same data is a "lollipop plot", which shows us each sample as a little "lollipop" sticking up or down from a zero line:

Let's zoom in on just the first six of these numbers:

Octave (or Matlab) will tell us their specific values:

>> S(1000:1005) ans = 10622 5624 614 1280 -3363 7694

We can think of this six-element vector **s** as being the sum of six other
vectors **s1** to **s6**, each of which "carries" just one of
its values, with all the other values being zero:

>> [10622 0 0 0 0 0 ] + ... [0 5624 0 0 0 0 ] + ... [0 0 614 0 0 0 ] + ... [0 0 0 1280 0 0 ] + ... [0 0 0 0 -3363 0 ] + ... [0 0 0 0 0 7694 ] ans = 10622 5624 614 1280 -3363 7694

Recall that an **impulse** (in the current context, anyhow) is a vector
whose first element has the value 1 and all of whose subsequent elements are
zero. The vector we've called **s1** is an impulse multiplied by 10622. The
vector s2 is an impulse **shifted** to the right by one element and scaled
by 5624. Thus we are decomposing **s** into a set of scaled and shifted impulses.
It should be clear that we can do this to an arbitrary vector.

The same decomposition represented graphically:

Why is this interesting? Well, consider some arbitrary shift-invariant linear
system **D**. Suppose that we apply D (without knowing anything more about
it) to an impulse, with the result shown below: the first sample of the output
is 1, the second sample is -1, and the rest of the samples are 0. This result
is the **impulse response** of **D**.

This is enough to predict the result of applying D to our scaled and shifted
impulses, **s**1 ... **s**n. Because **D** is **shift-invariant**,
the effect of shifting the input is just to shift the output by the same amount.
Thus an input consisting of a unit impulse shifted by any arbitrary amount will
produce a copy of the **impulse response**, shifted by that same amount.

We also know that D is **linear**, and therefore a scaled impulse as input
will produce a scaled copy of the impulse response as output.

Using these two facts, we can predict the response of **D** to each of the
scaled and shifted impulses **s**1 ... **s**n. This is shown graphically
below:

If we arrange the responses to s1 .. s6 as the rows of matrix, the actual numbers
will look like this:

*(The arrangement of these outputs as the rows of a matrix is purely for typographical
convenience; also notice that we've allow the response to input s6 to fall off
the end of the world, so to speak)*

>> Qout Qout = 10622 -10622 0 0 0 0 0 5624 -5624 0 0 0 0 0 614 -614 0 0 0 0 0 1280 -1280 0 0 0 0 0 -3363 3363 0 0 0 0 0 7694

This information, in turn, is enough to let us predict the response of the
system D to the original vector **s**, which (by construction) is just the
sum of **s1**+**s2**+**s3**+**s4**+**s5**+**s6**. Since **D** is linear, applying it to this sum is the same as applying it to the individual
components of the sum, and the adding up the results. This is just the sum of
the columns of the matrix shown above:

>> sum(Qout) ans = 10622 -4998 -5010 666 -4643 11057

*(Matlab "sum", applied to a matrix, produces a row vector of
the sums of the columns...)*

Notice that (at least for the second position in the sum and onward) this makes
the output in position i equal to the difference between the input in position
i and the input in position i-1. In other words, **D** happens to be calculating
the first difference of its input.

It should be clear that the same basic procedure will work for ANY shift-invariant linear system, and for ANY input to such a system:

- express the input as a sum of scaled and shifted impulses;
- calculate the response to each of these by scaling and shifting the system's impulse response;
- add up the resulting set of scaled and shifted impulse responses.

This process of adding up a set of scaled and shifted copies of one vector
(here the impulse response), using the values of another vector (here the input)
as the scaling values, is **convolution** -- at least this is one way to
define it.

Another way: the convolution of two vectors **a** and** b** is defined
as a vector **c**, whose kth element is (in MATLAB-ish terms)

c(k) = sum_over_j ( a(j) * b(k+1-j) ) Equation 1

*(The "+1" in "k+1-j" is due to the fact that MATLAB indices have
the bad taste to start from 1 instead of the mathematically more elegant 0). *

This formulation helps indicate that we can also think of convolution as a process of taking a running weighted average of a sequence -- that is, each element of the output vector is a linear combination of some of the elements of one of the input vectors-- where the weights are taken from the other input vector.

There are a couple of little problems: how long should **c** be? and what
should we do if *k*+1-*j* is negative or greater than the length of **b**?

These problems are a version of the "edge effects" we've already hinted at,
and will see again. One possible solution is to imagine that we are convolving
two infinite sequences created by embedding **a** and **b** in an ocean
of zeros. Now arbitrary index values---negative ones, ones that seemed "too
big"---make perfect sense. The value of extended **a** and extended **b** for index values outside their actual range is now perfectly well defined:
always zero. The result of Equation 1 will be another infinite-length sequence **c**.

A little thought will convince you that most of **c** will also be necessarily
zero, since the non-zero weights from **b** and the non-zero elements of **a** will not coincide in those cases. How many elements of **c** have
a chance to be non-zero? Well, just those integers *k* for which there
is at least one integer *j* such that 1 <= *j* <= length(**a**)
and 1 <= *k+1-j* <= length(**b**). With a little more thought, you can
see that this means that the length of **c **will be one less than the sum
of the lengths of **a **and **b**.

Referring again to Equation 1, and imagining the two vectors a and b as embedded
in their seas of zeros, we can see that we will get the right answer if we allow *k* to run from 1 to length(**a**)+length(**b**)-1, and for each
value of *k*, allow *j *to run from max(1,*k*+1-length(**b**))
to min(*k*,length(**a**)). Again, all of this is in MATLAB index terms,
and so we can transfer it directly to a MATLAB program myconv() to perform convolution:

function c = myconv(a,b) c = zeros(1,length(a)+length(b)-1); for k = 1:length(a)+length(b)-1 for j = max(1,k+1-length(b)):min(k,length(a)) c(k) = c(k) + a(j)*b(k+1-j); end end

This will give us just the piece of the conceptually infinite c that has a chance to be non-zero.

MATLAB has a built-in convolution function conv(), so we can compare the one that we just wrote:

>> a=[3 4 5]; >> b=[6 7 8]; >> myconv(a,b) ans = 18 45 82 67 40 >> conv(a,b) ans = 18 45 82 67 40

As an aside, we should mention that convolution will also give us the correct results if we think of a, b and c as the coefficients of polynomials, with c being the coefficients of the polynomial resulting from multiplying a and b together. Thus convolution is isomorphic to polynomial multiplication, so that e.g.

>> conv([2 3], [4 5]) ans = 8 22 15

can also be interpreted to mean that \((2x+3)(4x+5) = 8x^2 + 22x + 15\)

and

>> conv([3 4],[5 6 7]) ans = 15 38 45 28can also be interpreted to mean that \((3x+4)(5x^2+6x+7) = 15x^3 + 38x^2 + 45x + 28\)

If you believe this, it follows immediately from the commutativity of multiplication that convolution also commutes (and is associative, and distributes over addition).

We can exemplify these properties empirically:

>> conv( [3 4], [5 6 7] ) ans = 15 38 45 28 >> conv( [5 6 7], [3 4]) ans = 15 38 45 28These are important points, so if you do not immediately see that they arealways true, spend some time with Equation 1 -- or with the convolution operator in Matlab -- and convince yourself.

We've given two pictures of conv(a,b):

- in one, we add up a bunch of scaled and shifted copies of a, each copy scaled by one value of b, and shifted over to align with the location of that value in b.
- in the other, we use take a running weighted average of a, taking b (backwards) as the weights.

We can see the relationship between these
two pictures by expressing Equation 1 in matrix form. We have been thinking
of **b** as the impulse response of the system, **a** as the input, and **c** as the output. This implies that the matrix for S will have dimensions
length(**c**) by length(**a**), if **c** = **S****a** is to be
legal matix-ese.

Each element of the output c will be the inner product of a row of **S** with the input **a**. This will be exactly Equation 1 if the row of S in
question is just** b**, time-reversed, shifted, and suitably padded with
zeros. As **b** shifts out of the picture, we just shift in zeros from the
"sea of zeros" we imagine ourselves to be floating in.

A small modification of our convolution program will produce the needed matrix:

function c = cmat(a,b) % turn a convolution mask into a matrix operator for pedagogical purposes c = zeros(length(a)+length(b)-1,length(a)); for k = 1:length(a)+length(b)-1 for j = max(1,k+1-length(b)):min(k,length(a)) c(k,j) = b(k+1-j); end end

Trying it out:

>> a=[3 4]; >> b=[5 6 7]; >> cmat(a,b) ans = 5 0 6 5 7 6 0 7

Thus cmat(a,b) creates a matrix operator C that can be multiplied by the vector a to create exactly the same effect as convolution of a with b:

>> a = [3 4]; b = [5 6 7]; >> C = cmat(a,b) C = 5 0 6 5 7 6 0 7 >> conv(a,b) ans = 15 38 45 28 >> (C*a')' ans = 15 38 45 28

This works because the rows of **C** are suitably shifted (backwards-running)
copies of **b** -- or equivalently, because the columns of **C** are suitably
shifted (forwards-running) copies of **b**.

This gives us the two pictures of convolutional operators:

- THE RUNNING WEIGHTED AVERAGE OF THE INPUT: The rows of
**C**are shifted backwards copies of**b**, and the inner product of each row with**a**will give us a weighted average of a suitable piece of**a**, which we stick into the appropriate place in the output**c**. - THE SUM OF SCALED AND SHIFTED COPIES OF THE IMPULSE RESPONSE: The columns
of
**C**are shifted copies of**b**. Taking the other view of matrix multiplication, namely that the output is the sum of the columns of**C**weighted by the elements of**a**, gives us the other picture of convolution, namely adding up a set of scaled and shifted copies of the "impulse response"**b**.

A larger example:

>> a = [1 2 3 4 5]; b = [6 7 8]; >> >> C = cmat(a,b) C = 6 0 0 0 0 7 6 0 0 0 8 7 6 0 0 0 8 7 6 0 0 0 8 7 6 0 0 0 8 7 0 0 0 0 8 >> >> conv(a,b) ans = 6 19 40 61 82 67 40 >> >> (C*a')' ans = 6 19 40 61 82 67 40

In working through the details of convolution, we had to deal with the "edge
effect": the fact that the convolution equation (Equation 1) implies index values
for finite-length inputs **a** and **b** outside the range in which they
are defined.

Obviously we could choose quite a number of different ways to supply the missing values---the particular choice that we make should depend on what we are doing. There are some cases in which the "sea of zeros" concept is exactly correct.

However, there are alternative situations in which other ideas make more sense.
For instance, we might think of **b** as sitting in a sea of infinitely many
repeated copies of itself. Since this means that index values "off the end"
of **b** wrap around to the other end in a modular fashion, just as if **b** was on a circle, the kind of convolution that results is called "circular convolution."

Keep this in mind: we will come back to it in a later lecture.

Meanwhile, let's repeat the slogan we began with:

**The effect of any linear, shift-invariant system on
an arbitrary input signal is obtained by convolving the input signal with the
response of the system to a unit impulse.**

*(Notice that this is the same property of linear systems that we observed
in the case of color matching -- where we could learn everything we needed to
know about the system by probing it with a limited set of "monochromatic"
inputs. If the system were only linear, and not shift-invariant, the analogy
here would be to probe it with unit impulses at every possible index value --
each such probe giving us one column of the system matrix. That was practical
with a 31-element vector, but it would be less attractive with vectors of millions
or billions of elements! However, if the system is also shift-invariant, a probe
with just one impulse is enough, since the responses of all the shifted cases
can be predicted from it.)*

Convolution can always be seen as matrix multiplication -- this has to be true, because a system that can be implemented by convolution is a linear system (as well as being shift-invariant). Shift-invariance means that the system matrix has particular redundancies, though.

When the impulse response is of finite duration, this slogan is not only mathematically true, but also is often quite a practical way to implement the system, because we can implement the convolution in a fixed number of multiply-adds per input sample (exactly as many as there are non-zero values in to the system's impulse response). Systems of this type are generally called "finite impulse response" (FIR) filters, or equivalently "moving average" filters. When the impulse response is of infinite duration (as it perfectly well can be in a linear shift-invariant system!), then this slogan remains mathematically true, but is of less practical value (unless the impulse response can be truncated without significant effect). We'll learn later how to implement "infinite impulse response" (IIR) filters efficiently.

**The EE/DSP perspective.**

The goal of this section is to develop the basic material on impulse response and convolution in the style that is common in the digital signal processing literature in the discipline of Electrical Engineering, so as to help you become familiar with the type of notation that you are likely to encounter there. Also, perhaps going over the same ideas again in a different notation will help you to assimilate thm -- but be careful to keep the DSP/EE notation separate in your mind from linear algebra notation, or you will become very confused!

In this perspective, we treat a digital signal *s* as an infinitely-long
sequence of numbers. We can adapt the mathematical fiction of infinity to everyday
finite reality by assuming that all signal values are zero outside of some finite-length
sub-sequence. The positions in one of these infinitely-long sequences of numbers
are indexed by integers, so that we take *s*(*n*) to mean ``the nth
number in sequence *s*,'' usually called ``*s* of *n*'' for short.
Sometimes we will alternatively use *s*(*n*) to refer to the entire
sequence *s*, by thinking of *n* as a free variable.

We will let an index like *n* range over negative as well as positive
integers, and also zero. Thus

*s*(

*n*) where

*n*takes on all values from minus infinity to infinity.''

We will refer to the individual numbers in a sequence *s* as *elements* or *samples*. The word *sample* comes from the fact that we usually
think of these sequences as discretely-sampled versions of continuous functions,
such as the result of sampling an acoustic waveform some finite number of times
a second, but in fact nothing that is presented in this section depends on a
sequence being anything other than an ordered set of numbers.

The *unit impulse* or *unit sample sequence*, written , is a sequence that is one at sample point zero, and zero
everywhere else:

The Greek capital sigma, , pronounced *sum*, is used as a notation for adding up a
set of numbers, typically by having some variable take on a specified set of
values. Thus

1 + 2 + 3 + 4 + 5

and is shorthand for *f*(*x _{0}*) +

*f*(

*x*) +

_{1}*f*(

*x*) +

_{2}*f*(

*x*).

_{3} The notation is particularly helpful in dealing with sums over *sequences*,
in the sense of *sequence* used in this section, as in the following
simple example. The *unit step sequence*, written *u*(*n*),
is a sequence that is zero at all sample points less than zero, and 1 at all
sample points greater than or equal to zero:

*n*= -1 the sum will be 0, since all the values of for negative

*n*are 0; at

*n*=0 the cumulative sum jumps to 1, since ; and the cumulative sum stays at 1 for all values of

*n*greater than , since all the rest of the values of are 0 again.

This is not a particularly impressive use of the notation, but it should help you to understand that it can be perfectly
sensible to talk about infinite sums. Note that we can also express the relationship
between *u*(*n*) and in the other direction:

In general, it is useful to talk about applying the ordinary operations of
arithmetic to sequences. Thus we can write the product of sequences *x* and *y* as *xy*, meaning the sequence made up of the products of the
corresponding elements (**not** the inner product):

*x*and

*y*can be written

*x*+

*y*, meaning A sequence

*x*can be multiplied by a scalar ,with the meaning that each element of

*x*is individually so multiplied: Finally, a sequence may be shifted by any integer number of sample points: We already used this notation when we expressed the unit impulse sequence in terms of the unit step sequence, as the difference between a given sample and the immediately previous sample.

Any sequence can be expressed as a sum of scaled and shifted unit samples. Conceptually, this is trivial: we just make, for each sample of the original sequence, a new sequence whose sole non-zero member is that chosen sample, and we add up all these single-sample sequences to make up the original sequence. Each of these single-sample sequences (really, each sequence contains infinitely many samples, but only one of them is non-zero) can in turn be represented as a unit impulse (a sample of value 1 located at point ) scaled by the appropriate value and shifted to the appropriate place. In mathematical language, this is

(1) |

*k*is a variable that picks out each of the original samples, uses its value to scale the unit impulse, and then shifts the result to the position of the selected sample.

A *system* or *transform* *T* maps an input sequence *x*(*n*)
onto an output sequence *y*(*n*):

y(n) = T[x(n)] |
(2) |

Systems or transforms come in a wide variety of types. One important class
is known as *linear* systems. The propagation of ound in air in linear,
meaning that the *principle of superposition* applies, so that the pressure
disturbance resulting from two sounds propagating through the same region is
just the sum of the pressure disturbances corresponding to the individual sounds.
Linearity means the same thing as applied to sequences. We can express it mathematically
like this:

(3) |

(For real-valued sequences, homogeneity follows from superposition).

Now we replace the expression *x*(*n*) in 2 with the re-expression of *x*(*n*) found in 1 to obtain:

(4) |

That is, the sequence consisting of just sample *k* from *x* can
be rewritten as , our old friend the unit impulse scaled to the value
of *x*(*k*) and shifted to position *k*. The response of system *T* to this sequence is just

*x*(

*k*) outside of

*T*, giving us

(5) |

*x*(

*k*).

The original sequence *x* can be re-expressed as the sum of all of its
individual sample values; by linearity, this lets us express *T*[*x*]
as the sum over all samples of the response of T to a sequence consisting of
only the given sample; this response is specified for a particular sample *k* by equation 5, and so make
up the reponse to the original input, we just need to sum equation 5 over all *k*, which is:

(6) |

*T*to an arbitrary input sequence

*x*without applying

*T*to the input

*x*at all! The only thing

*T*operates on is the set of shifted unit impulses, which is independent of

*x*. Having once applied

*T*to the shifted unit impulses, we can calculate

*T*[

*x*] for arbitrary

*x*just by doing the multiplications and additions specified in equation 6.

However, there is one annoyance - we still must calculate for every shift *k*. This would be unnecessary if
the response of *T* to a shifted input was just an output shifted by the
same amount. This property, which is called *shift invariance*, holds
of many interesting systems. For example, if we put a certain test signal into
an acoustic resonator, and then put the same test signal into the same resonator
20 seconds later, we would expect the output to be the same, just shifted in
time by the same twenty seconds as the input (as long as the resonator hasn't
changed in the meantime).

In mathematical language, a system *T* is *shift-invariant* if
and only if

(7) |

*h*(

*n*) for the response of

*T*to the unshifted unit impulse , then if

*T*is shift-invariant, the response of

*T*to a unit impulse shifted by

*k*is just

*h*(

*n*-

*k*):

(8) |

*T*is shift-invariant as well as linear, we can re-write equation 6 as

(9) |

*T*, once we have calculated its

*impulse reponse*

*h*(

*n*) (its response to a unit impulse at sample point 0), we can forget about

*T*entirely, and just add up scaled and shifted copies of

*h*(

*n*) to calculate the response of

*T*to any input whatsoever. Thus any linear shift-invariant system is completely characterized by its impulse response

*h*(

*n*).

## Convolution again

The way of combining two sequences specified by equation 9 is our friend*convolution*. For any two sequences

*x*and

*y*, there will be another sequence

*w*obtained by

*convolving*

*x*with

*y*, following the equation

(where we're using '*' to symbolize the convolution operation...)

The following things are true for convolution in general, as you should be able to verify for yourself by some algebraic manipulation: