Assignment 2: Classification
Due Wednesday, Oct. 27, 3pm
Note: This assignment comprises two theoretical questions and one programming question. For the
theoretical questions handwritten or computer formatted answers should be handed in on paper.
For the programming part of this assignment you will write several functions and one main script
in Matlab. You will hand in a tarfile containing these files. Parts of Question 3 which ask for your
thoughts or reasoning can be answered with the Matlab script as comments.
1
Probability Theory: [6 marks]
A drunk squirrel falls onto a 1D tree branch. The location it lands on (let’s call it
s
) is drawn from a Gaussian
distribution with a mean of 0 and a standard deviation of 3 (i.e.,
s
∼ N
(0
,
9)
). The squirrel then takes just one
step (let’s call it
d
), which is drawn from a Gaussian distribution with a mean of 0 and a standard deviation of 2
(i.e.,
d
∼ N
(0
,
4)
). If
d
is positive then the squirrel moves to the right, otherwise it moves to the left. Finally
let’s assume that
s
and
d
are statistically independent.
If the final position of the squirrel (let’s call it
f
) is measured to be
f
= 3
, find the most likely location
s
that
the squirrel landed on originally. Show each step of your derivation (i.e. just giving the numeric solution won’t
earn you many marks). Hint: Try to write
f
in terms of
s
and
d
, and think of how to maximize the probability
of observing
f
= 3
.
2
Logistic Regression: [10 marks]
Here we consider the problem of classifying 2D inputs
(
x
1
, x
2
)
, using logistic regression. (Recall that, despite
its name,
logistic regression
is a classication algorithm, not a regression algorithm). Suppose we have two
classes, class
0
and class
1
, and let the output be denoted
y
∈ {
0
,
1
}
. Our classifier will take the form of logistic
regression:
p
(
y
= 1

w
,
x
) =
1
1 +
e

(
w
1
x
1
+
w
2
x
2
+
b
)
(1)
where the parameter vector is
w
= [
w
1
, w
2
, b
]
T
. Since there are only two classes, it must necessarily be true
that
p
(
y
= 0

x
,
w
) = 1

p
(
y
= 1

x
,
w
)
. For brevity, we can also write Equation (1) as
p
(
y
= 1

w
,
x
) =
σ
(
w
T
x
)
(2)
where
x
= [
x
1
, x
2
,
1]
T
, and
σ
(
a
) =
1
1 +
e

a
.
(3)
The negative loglikelihood of a collection of
N
training pairs
{
(
x
i
, y
i
)
}
N
i
=1
is
E
(
w
) =

log
N
p
i
=1
p
(
y
=
y
i

x
i
,
w
)
(4)
This objective function cannot be optimized in closedform.
(a)
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 DavidFleet
 Statistics, Data Mining, Machine Learning, GCC, Gradient descent

Click to edit the document details