a2 - CSCD11 Machine Learning and Data Mining, Fall 2010...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Assignment 2: Classification Due Wednesday, Oct. 27, 3pm Note: This assignment comprises two theoretical questions and one programming question. For the theoretical questions hand-written or computer formatted answers should be handed in on paper. For the programming part of this assignment you will write several functions and one main script in Matlab. You will hand in a tar-file containing these files. Parts of Question 3 which ask for your thoughts or reasoning can be answered with the Matlab script as comments. 1 Probability Theory: [6 marks] A drunk squirrel falls onto a 1-D tree branch. The location it lands on (let’s call it s ) is drawn from a Gaussian distribution with a mean of 0 and a standard deviation of 3 (i.e., s ∼ N (0 , 9) ). The squirrel then takes just one step (let’s call it d ), which is drawn from a Gaussian distribution with a mean of 0 and a standard deviation of 2 (i.e., d ∼ N (0 , 4) ). If d is positive then the squirrel moves to the right, otherwise it moves to the left. Finally let’s assume that s and d are statistically independent. If the final position of the squirrel (let’s call it f ) is measured to be f = 3 , find the most likely location s that the squirrel landed on originally. Show each step of your derivation (i.e. just giving the numeric solution won’t earn you many marks). Hint: Try to write f in terms of s and d , and think of how to maximize the probability of observing f = 3 . 2 Logistic Regression: [10 marks] Here we consider the problem of classifying 2D inputs ( x 1 , x 2 ) , using logistic regression. (Recall that, despite its name, logistic regression is a classication algorithm, not a regression algorithm). Suppose we have two classes, class 0 and class 1 , and let the output be denoted y ∈ { 0 , 1 } . Our classifier will take the form of logistic regression: p ( y = 1 | w , x ) = 1 1 + e - ( w 1 x 1 + w 2 x 2 + b ) (1) where the parameter vector is w = [ w 1 , w 2 , b ] T . Since there are only two classes, it must necessarily be true that p ( y = 0 | x , w ) = 1 - p ( y = 1 | x , w ) . For brevity, we can also write Equation (1) as p ( y = 1 | w , x ) = σ ( w T x ) (2) where x = [ x 1 , x 2 , 1] T , and σ ( a ) = 1 1 + e - a . (3) The negative log-likelihood of a collection of N training pairs { ( x i , y i ) } N i =1 is E ( w ) = - log N p i =1 p ( y = y i | x i , w ) (4) This objective function cannot be optimized in closed-form. (a)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 4

a2 - CSCD11 Machine Learning and Data Mining, Fall 2010...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online