Linguistics 525
Computational Analysis and Modeling of Biological Signals and Systems
Homework 5

Due: 4/13/2022

Install Matlab2022a

The "deep learning" toolbox is convenient for this exercise, so install the latest version if you haven't. Information about how to do so is here. You'll need some of the toolboxes, so you might as well install all of them.

(If your machine doesn't seem to be up to this, there are lots of labs fround campus that have Matlab installed -- ask and we'll find a convenient option for you.)

Background

Set up a .m file that starts like this, creating a simple XOR classification problem:

NN=10000;
X=2*rand(NN,2)-1;
Class=xor(X(:,1)>0,X(:,2)>0);
X1=X(Class,:); X2=X(~Class,:);
plot(X1(:,1),X1(:,2),'ro',X2(:,1),X2(:,2),'bx')
xlabel('Dimension1'); ylabel('Dimension2');
title('Random XOR data');
legend('True','False', 'Location','NorthEastOutside')

Your first assignment is to apply linear discriminant analysis to this simple problem. You can do it directly, using these three lines of code:

w = inv(cov(X1)+cov(X2))*(mean(X1)-mean(X2))';
wX1 = X1*w;
wX2 = X2*w;

Try it, and estimate how well it works.

Optionally, try a quadratic classifier, using Matlab's fitcdiscr() function. Does that work?

Now let's try a multilayer "feedforward" perceptron, the earliest and simplest kind of "deep net". The code below will create, train, and test this approach. Try it.

net = feedforwardnet([2 20 20 20 1],'trainrp'); 
net = train(net,X',Class');
view(net);
Class1 = net(X');
Results = [abs(Class1)' Class];
histogram(abs(Results(:,1)-Results(:,2)));
xlabel('Hypothesis - Truth')
title('Feed Forward Net for XOR: Performance on Training Data')
xlim([0 1]);
sum(abs(Results(:,1)-Results(:,2))>0.5)
 

Since the training of such networks is an iterative process that starts with random weights (and in this case, learns from random data), the results can be different from trial to trial. So try this a few times. It should work fairly well, at least some of the time.

The learning process, and the particular training function we chose, separated the input into training and testing sections so as to avoid (or at least minimize) overtraining. But our last test calculated the distribution of classification errors on the training data -- so let's generate some new random data and try it:

NewX = 2*rand(NN,2)-1;
NewClass = xor(NewX(:,1)>0,NewX(:,2)>0);
NewX1=NewX(NewClass,:); NewX2=NewX(~NewClass,:);
%
NewClass1 = net(NewX');
NewResults = [abs(NewClass1)' NewClass];
histogram(abs(NewResults(:,1)-NewResults(:,2)));
xlim([0 1]);
xlabel('Hypothesis - Truth')
title('Feed Forward Net: Performance on XOR Test Data')
sum(abs(NewResults(:,1)-NewResults(:,2))>0.5)

Again, the results will vary from trial to trial, so repeat the whole training and testing process a few times. But usually, if the net worked well on the training data, it will probably work on the new test data.

Generalization?

The good thing about "deep nets" is that some version often performs well on difficult problems, given enough training data. A problem with them is that in most cases, we don't understand why they work or don't work, and therefore can't predict whether or not they'll work on inputs with somewhat different characterisics. These issues are often discussed under the headings of "explainability" and "generalization" (or less positively, "brittleness").

Let's try a simple generalization test -- instead of data from -1 to 1, let's try test data from -10 to 10. The criterion is still the same -- classification is a single, simple Matlab expression -- but let's see what our net solution does. The new data is easy to create:

NewX = 20*rand(NN,2)-10;
NewClass = xor(NewX(:,1)>0,NewX(:,2)>0);
NewX1=NewX(NewClass,:);
NewX2=NewX(~NewClass,:);
%
NewClass1 = net(NewX');
NewResults = [abs(NewClass1)' NewClass];
plot(NewX1(:,1),NewX1(:,2),'ro',NewX2(:,1),NewX2(:,2),'bx')
xlabel('Dimension1'); ylabel('Dimension2');
title('Feed Forward Net on Random XOR test data (Range 10x)');
legend('True','False', 'Location','NorthEastOutside')
 

And also easy to test -- you can use the same code as before.

Again, the results will differ from trial to trial, so try it more than once.

Onward

To whatever extent your interest takes you, explore this space further. Some examples:

What is the effect of changing the amount of training data? We started with 10000 -- how would it work with 100, or 1000, or 100000?

What if we change the (lower and upper) bounds randomly from sample to sample, while keeping the threshold at 0? Can our network learn that? Does it take more training data?

How about changing the threshold from 0 to some other value? Can we adapt a well-trained network with a small(er) amount of new training data based on the new threshold? (This is called "fine tuning"..)

What about inverting the classification? In the Matlab code, this just means that we use xor(X(:,1)<0,X(:,2)<0) as the classifier. How much new data does it take to teach this new classifier to a well-trained net?

What about N-ary parity rather than XOR, still with random numerical data rather than binary data. Now the classification depends on whether an odd number of values is greater than zero. What if the order of each example varies (i.e. the input vector might have 2 elements or 20 or 37?)

How do other network architectures work on such problems? If you're feeling ambitious, try data in which the threshold varies slowly over "time", say in a sinusoidal pattern with long period (like 1000 or 5000 samples). Would an LTSM net cope?

etc.

 :