Fuyang's Blog

On the way to achieve minimalism and essentialism

Tag: Neural Networks

Machine Learning – Stanford -Week 5 – Neural Networks: Implementation

Continue with the Machine Learning course by Andrew Ng. This chapter we try to implement a simple Neural Network Classification algorithm.

Below is the definition of the problem.

1.neural_network_classification.learning.setup.1

 

Cost function 

Cost function is calculated as below.

2.neural_network_classification.learning.costfunction.1

Using the equation shown above, a sample MATLAB code calculating the cost function of a one layer Neural Network,  can be like this:

% First use forward propagation calculate the output h
a1 = [ones(m,1) X]; % m x 401
a2 = [ones(m,1) sigmoid(a1 * Theta1')]; % (m x 25) -> m x 26 
a3 = sigmoid(a2 * Theta2'); % (m x 26) * (26 x 10) = m by 10

% y is m by 10
h = a3;

for m_= 1:m
 a = 1:num_labels; % a is a temp para
 Y = (a == y(m_)); % classification label, 1 by 10 matrix
 J = J + ((-Y) * log(h(m_,:)') - (1-Y) * log(1-h(m_,:)')) ;
end

J = J/m;

% Plus regularization term
J = J + lambda/(2*m)* ( sum(sum(Theta1(:,2:end).^2)) ...
 + sum(sum(Theta2(:,2:end).^2)));

 

Gradient Computation

The following graphs are shown to illustrate the method of computing gradient.

2.neural_network_classification.learning.costfunction.23.neural_network_classification.learning.backpropagation.13.neural_network_classification.learning.backpropagation.2

Following the previous example code, here we can see am MATLAB implementation of a simple one layer Neural Network algorithm, with the part of code that calculate the gradient:

D1 = zeros(size(Theta1));
D2 = zeros(size(Theta2));

% Part 2 - back propagation
for t = 1:m

 % Step 1: perform forward propagation

 a1 = [1 X(t,:)]; % 1 x 401

 z2 = a1 * Theta1'; % 1x25
 a2 = [1 sigmoid(z2)]; % (1 x 25) -> 1 x 26 

 z3 = a2 * Theta2'; % (1 x 26) * (26 x 10) = 1 by 10
 a3 = sigmoid(z3) ;% 1x10

 % Step 2: using y to calculate delta_L

 a = 1:num_labels; % a is a temp para
 Y = (a == y(t)); % making Y matrix as classification label

 d3 = a3 - Y; % 1 by 10

 % Step 3: backward propagation to calculate delta_L-1,
% delta_L-2, ... until delta_2. (this example only have one layer,
% so only need to calculate delta_2)

 d2 = Theta2' * d3'; % 26 x 1
 d2 = d2(2:end); % 25 x 1
 d2 = d2 .* sigmoidGradient(z2)';

 % Alternatively:
 %d2 = Theta2' * d3' .* a2' .* (1-a2)'; % 26 x 1
 %d2 = d2(2:end); % 25 x 1

 % Step 4: accumulate Delta value for all m input data sample
 % Theta1 has size 25 x 401
 % Theta2 has size 10 x 26

 D2 = D2 + d3' * a2; % 10 x 26
 D1 = D1 + d2 * a1; % 25 x 401

end

% Finally, calculate the gradient for all theta
Theta1_grad = 1/m*D1 + lambda/m*[zeros(size(Theta1,1),1) Theta1(:, 2:end)];
Theta2_grad = 1/m*D2 + lambda/m*[zeros(size(Theta2,1),1) Theta2(:, 2:end)];
 

Some another illustrations to show how forward propagation and backward propagation are working. You might also want to use it to understand better on how to do the algorithm implementation.

4.neural_network_classification.learning.backpropagation.understanding.14.neural_network_classification.learning.backpropagation.understanding.24.neural_network_classification.learning.backpropagation.understanding.3

A sample of how to use advanced optimization function to find the best Theta solutions (with some vector reshape operation needed):
5.neural_network_classification.learning.implementation.15.neural_network_classification.learning.implementation.2

Here is a pop quiz about it:
5.neural_network_classification.learning.implementation.3

A short summary about the learning algorithm procedure.

5.neural_network_classification.learning.implementation.4

 

Gradient Checking

Since the Neural Network algorithm is quite complicated and might be very buggy, so one good practice is that during implementation process, one can simultaneously calculate gradients by a numerical estimation method, and check if this value close enough to the gradient calculated by the learning algorithm.

This will help to generate the bug free code. And one should remember to turn the gradient checking function off when using the learning algorithm in production environment, since this numerical estimation method is very computational expensive.

6.neural_network_classification.learning.Gchecking.16.neural_network_classification.learning.Gchecking.26.neural_network_classification.learning.Gchecking.36.neural_network_classification.learning.Gchecking.4

 

Theta Initialization

Why do we have to use a random theta initialization?

7.neural_network_classification.learning.ThetaInitial.17.neural_network_classification.learning.ThetaInitial.27.neural_network_classification.learning.ThetaInitial.3

 

Summary

8.neural_network_classification.learning.summary.1

8.neural_network_classification.learning.summary.28.neural_network_classification.learning.summary.38.neural_network_classification.learning.summary.4

 

End of the course. Pop quiz 

9.pop_quiz.19.pop_quiz.29.pop_quiz.39.pop_quiz.49.pop_quiz.5

Congratulations, if you have followed the course so far, you will have a good understanding about Neural Network learning algorithms. If you also have done the course MATLAB exercise, you will be amazed by how “simple” it is that, just with a few lines of code, a learning algorithm can learn by itself to recognize hand written numbers.

I personally think Neural Network Learning is a very powerful tool and in the future it might have great potential to form very intelligent programs that can automatize lots of tedious works for people.

Thank you Andrew Ng for providing such a great course for everyone on the planet who is interested in machine learning. You are wonderful 🙂

Machine Learning – Stanford -Week 4 – Neural Networks: Representation

 

The following content from edX course Machine Learning taught by Andrew Ng.

Introduction to Neural Networks

Welcome to week 4! This week, we are covering neural networks. Neural networks is a model inspired by how the brain works. It was very widely used in 80s and early 90s; popularity diminished in late 90s (computationally expensive).

Recent resurgence: State-of-the-art technique for many applications. It is widely used today in many applications: when your phone interprets and understand your voice commands, it is likely that a neural network is helping to understand your speech; when you cash a check, the machines that automatically read the digits also use neural networks.


 

Why we need a new algorithm?

1.motivation.1

1.motivation.2

Feature “explosion” for logistic regression. Neural network will hopefully help solving this issue.

The “one learning algorithm” hypothesis
2.neurons.12.neurons.22.neurons.3


 

How it can solve complex non-linear problem2.neurons.42.neurons.5

Key point

If network has s_j units in layer j, s_{j+1} units in layer j+1, then \Theta^{(j)} will be of dimension s_{j+1} \times (s_j + 1)

This is very important to know or memorize later on for doing vectorized implementation. Or see pop quiz below.

3.neurons.forward.propagation.13.neurons.forward.propagation.2


 

Non-linear classification example

4.non-linear.example.14.non-linear.example.and.14.non-linear.example.or.14.non-linear.example.or.24.non-linear.example.or.3


 

Multiple output units: One-vs-all

5.multi-example.15.multi-example.2


 

Pop Quiz

6.pop.quiz.16.pop.quiz.1.1.26.pop.quiz.1.16.pop.quiz.1.26.pop.quiz.1.36.pop.quiz.2.16.pop.quiz.2.26.pop.quiz.3.16.pop.quiz.3.2