Machine Learning – Stanford -Week 5 – Neural Networks: Implementation

by Fuyang

Continue with the Machine Learning course by Andrew Ng. This chapter we try to implement a simple Neural Network Classification algorithm.

Below is the definition of the problem.

1.neural_network_classification.learning.setup.1

 

Cost function 

Cost function is calculated as below.

2.neural_network_classification.learning.costfunction.1

Using the equation shown above, a sample MATLAB code calculating the cost function of a one layer Neural Network,  can be like this:

% First use forward propagation calculate the output h
a1 = [ones(m,1) X]; % m x 401
a2 = [ones(m,1) sigmoid(a1 * Theta1')]; % (m x 25) -> m x 26 
a3 = sigmoid(a2 * Theta2'); % (m x 26) * (26 x 10) = m by 10

% y is m by 10
h = a3;

for m_= 1:m
 a = 1:num_labels; % a is a temp para
 Y = (a == y(m_)); % classification label, 1 by 10 matrix
 J = J + ((-Y) * log(h(m_,:)') - (1-Y) * log(1-h(m_,:)')) ;
end

J = J/m;

% Plus regularization term
J = J + lambda/(2*m)* ( sum(sum(Theta1(:,2:end).^2)) ...
 + sum(sum(Theta2(:,2:end).^2)));

 

Gradient Computation

The following graphs are shown to illustrate the method of computing gradient.

2.neural_network_classification.learning.costfunction.23.neural_network_classification.learning.backpropagation.13.neural_network_classification.learning.backpropagation.2

Following the previous example code, here we can see am MATLAB implementation of a simple one layer Neural Network algorithm, with the part of code that calculate the gradient:

D1 = zeros(size(Theta1));
D2 = zeros(size(Theta2));

% Part 2 - back propagation
for t = 1:m

 % Step 1: perform forward propagation

 a1 = [1 X(t,:)]; % 1 x 401

 z2 = a1 * Theta1'; % 1x25
 a2 = [1 sigmoid(z2)]; % (1 x 25) -> 1 x 26 

 z3 = a2 * Theta2'; % (1 x 26) * (26 x 10) = 1 by 10
 a3 = sigmoid(z3) ;% 1x10

 % Step 2: using y to calculate delta_L

 a = 1:num_labels; % a is a temp para
 Y = (a == y(t)); % making Y matrix as classification label

 d3 = a3 - Y; % 1 by 10

 % Step 3: backward propagation to calculate delta_L-1,
% delta_L-2, ... until delta_2. (this example only have one layer,
% so only need to calculate delta_2)

 d2 = Theta2' * d3'; % 26 x 1
 d2 = d2(2:end); % 25 x 1
 d2 = d2 .* sigmoidGradient(z2)';

 % Alternatively:
 %d2 = Theta2' * d3' .* a2' .* (1-a2)'; % 26 x 1
 %d2 = d2(2:end); % 25 x 1

 % Step 4: accumulate Delta value for all m input data sample
 % Theta1 has size 25 x 401
 % Theta2 has size 10 x 26

 D2 = D2 + d3' * a2; % 10 x 26
 D1 = D1 + d2 * a1; % 25 x 401

end

% Finally, calculate the gradient for all theta
Theta1_grad = 1/m*D1 + lambda/m*[zeros(size(Theta1,1),1) Theta1(:, 2:end)];
Theta2_grad = 1/m*D2 + lambda/m*[zeros(size(Theta2,1),1) Theta2(:, 2:end)];
 

Some another illustrations to show how forward propagation and backward propagation are working. You might also want to use it to understand better on how to do the algorithm implementation.

4.neural_network_classification.learning.backpropagation.understanding.14.neural_network_classification.learning.backpropagation.understanding.24.neural_network_classification.learning.backpropagation.understanding.3

A sample of how to use advanced optimization function to find the best Theta solutions (with some vector reshape operation needed):
5.neural_network_classification.learning.implementation.15.neural_network_classification.learning.implementation.2

Here is a pop quiz about it:
5.neural_network_classification.learning.implementation.3

A short summary about the learning algorithm procedure.

5.neural_network_classification.learning.implementation.4

 

Gradient Checking

Since the Neural Network algorithm is quite complicated and might be very buggy, so one good practice is that during implementation process, one can simultaneously calculate gradients by a numerical estimation method, and check if this value close enough to the gradient calculated by the learning algorithm.

This will help to generate the bug free code. And one should remember to turn the gradient checking function off when using the learning algorithm in production environment, since this numerical estimation method is very computational expensive.

6.neural_network_classification.learning.Gchecking.16.neural_network_classification.learning.Gchecking.26.neural_network_classification.learning.Gchecking.36.neural_network_classification.learning.Gchecking.4

 

Theta Initialization

Why do we have to use a random theta initialization?

7.neural_network_classification.learning.ThetaInitial.17.neural_network_classification.learning.ThetaInitial.27.neural_network_classification.learning.ThetaInitial.3

 

Summary

8.neural_network_classification.learning.summary.1

8.neural_network_classification.learning.summary.28.neural_network_classification.learning.summary.38.neural_network_classification.learning.summary.4

 

End of the course. Pop quiz 

9.pop_quiz.19.pop_quiz.29.pop_quiz.39.pop_quiz.49.pop_quiz.5

Congratulations, if you have followed the course so far, you will have a good understanding about Neural Network learning algorithms. If you also have done the course MATLAB exercise, you will be amazed by how “simple” it is that, just with a few lines of code, a learning algorithm can learn by itself to recognize hand written numbers.

I personally think Neural Network Learning is a very powerful tool and in the future it might have great potential to form very intelligent programs that can automatize lots of tedious works for people.

Thank you Andrew Ng for providing such a great course for everyone on the planet who is interested in machine learning. You are wonderful 🙂