### Machine Learning – Stanford -Week 5 – Neural Networks: Implementation

#### by Fuyang

Continue with the Machine Learning course by Andrew Ng. This chapter we try to implement a simple Neural Network Classification algorithm.

Below is the definition of the problem.

**Cost function **

Cost function is calculated as below.

Using the equation shown above, a sample MATLAB code calculating the cost function of a one layer Neural Network, can be like this:

% First use forward propagation calculate the output h a1 = [ones(m,1) X]; % m x 401 a2 = [ones(m,1) sigmoid(a1 * Theta1')]; % (m x 25) -> m x 26 a3 = sigmoid(a2 * Theta2'); % (m x 26) * (26 x 10) = m by 10 % y is m by 10 h = a3; for m_= 1:m a = 1:num_labels; % a is a temp para Y = (a == y(m_)); % classification label, 1 by 10 matrix J = J + ((-Y) * log(h(m_,:)') - (1-Y) * log(1-h(m_,:)')) ; end J = J/m; % Plus regularization term J = J + lambda/(2*m)* ( sum(sum(Theta1(:,2:end).^2)) ... + sum(sum(Theta2(:,2:end).^2)));

**Gradient Computation**

The following graphs are shown to illustrate the method of computing gradient.

Following the previous example code, here we can see am MATLAB implementation of a simple one layer Neural Network algorithm, with the part of code that calculate the gradient:

D1 = zeros(size(Theta1)); D2 = zeros(size(Theta2)); % Part 2 - back propagation for t = 1:m % Step 1: perform forward propagation a1 = [1 X(t,:)]; % 1 x 401 z2 = a1 * Theta1'; % 1x25 a2 = [1 sigmoid(z2)]; % (1 x 25) -&gt; 1 x 26 z3 = a2 * Theta2'; % (1 x 26) * (26 x 10) = 1 by 10 a3 = sigmoid(z3) ;% 1x10 % Step 2: using y to calculate delta_L a = 1:num_labels; % a is a temp para Y = (a == y(t)); % making Y matrix as classification label d3 = a3 - Y; % 1 by 10 % Step 3: backward propagation to calculate delta_L-1, % delta_L-2, ... until delta_2. (this example only have one layer, % so only need to calculate delta_2) d2 = Theta2' * d3'; % 26 x 1 d2 = d2(2:end); % 25 x 1 d2 = d2 .* sigmoidGradient(z2)'; % Alternatively: %d2 = Theta2' * d3' .* a2' .* (1-a2)'; % 26 x 1 %d2 = d2(2:end); % 25 x 1 % Step 4: accumulate Delta value for all m input data sample % Theta1 has size 25 x 401 % Theta2 has size 10 x 26 D2 = D2 + d3' * a2; % 10 x 26 D1 = D1 + d2 * a1; % 25 x 401 end % Finally, calculate the gradient for all theta Theta1_grad = 1/m*D1 + lambda/m*[zeros(size(Theta1,1),1) Theta1(:, 2:end)]; Theta2_grad = 1/m*D2 + lambda/m*[zeros(size(Theta2,1),1) Theta2(:, 2:end)];

Some another illustrations to show how forward propagation and backward propagation are working. You might also want to use it to understand better on how to do the algorithm implementation.

A sample of how to use advanced optimization function to find the best Theta solutions (with some vector reshape operation needed):

Here is a pop quiz about it:

A short summary about the learning algorithm procedure.

**Gradient Checking**

Since the Neural Network algorithm is quite complicated and might be very buggy, so one good practice is that during implementation process, one can simultaneously calculate gradients by a numerical estimation method, and check if this value close enough to the gradient calculated by the learning algorithm.

This will help to generate the bug free code. And one should remember to turn the gradient checking function off when using the learning algorithm in production environment, since this numerical estimation method is very computational expensive.

**Theta Initialization**

Why do we have to use a random theta initialization?

**Summary**

**End of the course. Pop quiz **

Congratulations, if you have followed the course so far, you will have a good understanding about Neural Network learning algorithms. If you also have done the course MATLAB exercise, you will be amazed by how “simple” it is that, just with a few lines of code, a learning algorithm can learn by itself to recognize hand written numbers.

I personally think Neural Network Learning is a very powerful tool and in the future it might have great potential to form very intelligent programs that can automatize lots of tedious works for people.

Thank you Andrew Ng for providing such a great course for everyone on the planet who is interested in machine learning. You are wonderful 🙂

Hey. Thanks for the tutorial. Helped me a ton.

There’s just one mistake on your part of the code to calculate the gradient though, specifically in line 9.

Your line -> a1 = [1 X(t,:)];

Corrected line – > a1 = X(t,:);

Since the started code already appends a column of 1’s to the feature matrix.

LikeLike

Hi Anurag, thanks for the input. Maybe the started code was different when I was doing my homework 😛

Anyway, good luck with the learning 🙂

LikeLike

Thanks Fuyang, these notes are good for revision 🙂

LikeLike

Thank you for the feedback 🙂

LikeLike

Fuyang,

why are we calculating Theta1_grad and Theta2_grad doing?

I suppose, this gradients can be used for updating theta’s, as follow:

theta1 = theta1 – alpha*theta1_grad

theta2 = theta2 – alpha*theta2_grad

let me know if doing it wrong

Thanks!

LikeLike

yes, the weight gradient(or theta_grad here) is used to update weight. They need to be calculated via the back propagation algorithm. I didn’t put the full matlab code but if I remember correctly it’s the code from part of a function and this function in the end returns those gradients (theta_grad) and they will be later used to update the weight themselves at each step.

LikeLike

Thanks for posting this, i was really stuck on the gradients for the NN. The resources provided by the course weren`t enough to figure it out.

LikeLike

No problem, and thanks for the feedback. Remember there is course forum you can ask for help as well 🙂

The homework might be a bit difficult if one has no matlab or octave background. Feel free to ask for some help on the forums I think many students are helping each other there. Good luck.

LikeLike

Hi, nice post! I watched the video over 10 times and still can’t understand the maths and algorithms of backpropagation, this part frustrated me for a long time, could you explain or point me to some resources that explain clearly? Thanks so much!

LikeLike

Maybe you can also check this video out:

It explain the back propagation in a bit different way, it helps me to intuitively understand it better.

LikeLike