10-701 Machine Learning, Fall 2012: Homework 2 Solutions
1
Learning with L1 norm
Suppose you want to predict an unknown value Y R, but you are only given a sequence of noisy
observations x1 , . . . , xn of Y with iid noise (xi = Y + i ).
We have seen in t

Probability Review
Thursday Sep 13
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
Structural properties
Independence,

Example Answer
3.1
a)
Nave Bayes
By conditional independence assumption of Nave Bayes,
Similarly for Y=B,
Logistic Regression
Note: there are some alternative answers for this question, for example
and
All of them are correct, but one needs to be careful

10-701 Machine Learning, Fall 2012: Homework 1 Solutions
1
Decision Trees, [25pt, Martin]
1. [10 points]
(a) For the rst split, there are two possibilities to consider either we split on Size=Big
v.s. Size=Small, or Orbit=Near v.s. Orbit=Far. Lets calcula

MLE and MAP Examples
1
Multinomial Distribution
Given some integer k > 1, let Θ be the set of vectors θ = (θ1 , ., θk ) satisfying θi ≥ 0 and
k
i=1 θi = 1. For any θ ∈ Θ we deﬁne the probability mass function
pθ (x) =
I(x=i)
k
i=1 θi
for x ∈ {1, 2, ., k}