Machine Learning – Stanford -Week 3 – Logistic Regression

Classification by regression

Great things learned again from Coursera Machine Learning course, taught by Andrwe Ng. Here are some of the key notes. Perhaps the first thing to notice here is that, even though the title of this article seems to be about regression, but it is actually a way of doing classification. So I think we could simply saying that “logistic regression” is a “classification” method.

How we got the idea?…(some background info)

If we follow from the previous linear regression course, we can naturally ask, can one use linear regression for Classification?

logistic regression model - choose para0

Answer is yes, like the simple example above where a hypothesis line is generated to do classification in a way that: if h larger than 0.5, predict class A, if h less than 0.5, predict class B.

However there are many limitations with this way. An apparent one is that when some cases appears, such as if a training sample appear on the far right of the graph (malignant tumor with very large size), the h function’s slop changes and will result in that some of other prediction to be wrong.

Logistic Function (or Sigmoid Function)

logistic regression model 1

In order to solve this, we can shape the hypothesis function h as a logistic function, defined as follow, where the value of h can only change between 0 and 1.

logistic regression model - choose para1

Cost Function

When using logistic function as the hypothesis, we have to change the format of cost function a bit. Because for the previous way of defining cost function, the cost function will have many “local minimum” introduced because using logistic function. And this won’t help the gradient descent algorithm to work properly. logistic regression model - choose para2

Thus, we can defined the following cost function, so that is will be come “convex” and “gradient descent friendly”.logistic regression model - choose para3logistic regression model - choose para4

And, lucky or mathematically, the above cost function can be defined or re-written in the math format as following:

logistic regression model - choose para5

So by now, we have fully defined a simple classification problem and a group of classification hypothesis and cost function are ready to use.

Gradient Descent

And here is what the gradient descent algorithm looks like for Logistic Regression.

logistic regression model - choose para6

And here is a MATLAB version of the implementation of the algorithm above:

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
% J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
% parameter for logistic regression and the gradient of the cost
% w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples
h=sigmoid(X*theta); % m by 1

J = 1/m * ((-y')*log(h) - (1-y')*log(1-h)); % 1 by 1

grad = 1/m * X'*(h-y); % n by 1, gradient


Hypothesis Interpretation

And a few more words (or pictures) discuss the interpretation of the hypothesis function output – basically speaking, it output a probability measure.

logistic regression model 2logistic regression model 3

Decision Boundary

Continue the interpretation above, here we define the concept of “decision boundary” (which is a line for the case of 2 features):

logistic regression model 4

And it can be non-linear, high order as well:logistic regression model 5

Multi-class Classification

So for the case of more than 2 classes, one can use the so called “one-vs-all” (or “one – vs-rest”) method to finish the job.multi-class-1multi-class-2multi-class-3

A few more words about some other optimization algorithm (such as fminunc in Qctave):