COMP 4360: Machine Learning
Winter 2017
Homework 1: Due Feb 17 at 3:00pm, 40 points in total
General guidelines for homeworks:
You are encouraged to meet with other students to discuss the homework, but all write-ups
must be done on your own. Do not take
Announcement
0 If you still have not submitted the honesty
declaration, please do so as soon as possible
If we do not have your honesty declaration by HW1
deadline, we will not grade your homework and you
will get zero for HW1. And we will not re-grade i
K-means Recap
i Randomly initialize kcenters
g (0) =Elw)"lLk(m
a Classify: Assign each point jecfw_1 ,.m to nearest
center:
a 0% )<arg min Hui ij|2
zL,.,~_
~'
in Recenter: pi becomes centroid of its points:
Dpt+1)<argmin Z IIMLEJ'IIZ 7:6cfw_13'H9k5
'1 j:0
Announcement
0 Midterm will cover materials before the reading
week
HW2 is available in UMLearn
Office hours during the reading week:
Thursday: loam-11am
Friday: loam-11am, 3pm-4pm
Or any time when the office door is open and I am alone SVM vs. Logis
Intuition of linear model parameters
The SVM [email protected] reweight the importance of each
component of the inpu x for the classification problem. We can sometimes
visualize W to get insights of what has been learned.
I
ii
1-1
r-J
'-I-'l
4.1]
4'!
+
Nonparametric Methods
(aka Instance-Based Learning)
COMP4360: Machine Learning
Parametric methods
Assume some functional form (Gaussian, Bernoulli,
Multinomial, logistic, Linear)
Estimate parameters (,w,) (e.g. using
MLE/MAP or loss functions) and plug
Pros and cons of gradient descent
Simple and often quite effective in ML tasks
Applies to smooth functions (differentiable)
' Might find a local minimum, rather than a global one Pros and cons of gradient descent
There is only one local optimum if the f
Lagrange Multiplier Dual Variables
mines.
S.t. cc 2 b
Moving the constraint to objective function
Lagrangian:
. Mm, a) = 322 @($ b)
5.1:. a 2 0
W4Mi
Solve:
minimaxg-
8.1:. a Z 0 Why does this work at all?
Lian, CE) =(;: E a(a: b)
m i 1! I!
~
5. .'0a
COMPSCI 74.436
UNIVERSITY OF MANITOBA
Midterm
Winter 2005
COMPUTER SCIENCE
Machine Learning
Date:
Time:
Room:
Friday, 10 March 2005
15:30 - 16:20
EITC E2-165, University of Manitoba
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a closed
COMPSCI COMP 4360
UNIVERSITY OF MANITOBA
Midterm
Winter 2011
COMPUTER SCIENCE
Machine Learning
Date:
Time:
Room:
Friday, 2nd March 2011
15:30 - 16:20
EITC E2-165, University of Manitoba
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a clo
COMP COMP4360
UNIVERSITY OF MANITOBA
Final Examination
Winter 2011
COMPUTER SCIENCE
Machine Learning
Paper No.:
Examiners:
Date:
Time:
Room:
Jacky Baltes
Monday, 11th April 2011
18:00
Frank Kennedy, Brown Gym (184-208)
(Time allowed: 180 Minutes)
NOTE:
At
COMPSCI COMP 4360
University of Manitoba
Midterm
Winter 2012
COMPUTER SCIENCE
Machine Learning
Date:
Time:
Room:
Friday, 9th March 2012
15:30 - 16:20
EITC E2-304, University of Manitoba
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a clo
COMPSCI COMP 4360
UNIVERSITY OF MANITOBA
Midterm
Winter 2007
COMPUTER SCIENCE
Machine Learning
Date:
Time:
Room:
Friday, 2nd March 2007
15:30 - 16:20
EITC E2-165, University of Manitoba
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a clo
COMPSCI 74.436
UNIVERSITY OF MANITOBA
Final Examination
Winter 2005
COMPUTER SCIENCE
Machine Learning
Paper No.: 444
Examiners: Jacky Baltes
Date:
18 April 2006
Time:
9:00
Room:
University College Great Hall (25 - 48)
(Time allowed: 180 Minutes)
NOTE:
Att
COMPSCI 74.436
UNIVERSITY OF MANITOBA
Final Examination
Winter 2004
COMPUTER SCIENCE
Machine Learning
Paper No.:
Examiners:
Date:
Time:
Room:
234
Jacky Baltes
14 April 2004
18:00
University College, Great Hall
(Time allowed: 180 Minutes)
NOTE:
Attempt all
COMPSCI 74.436
UNIVERSITY OF MANITOBA
Midterm
Winter 2003
COMPUTER SCIENCE
Machine Learning
Date:
Time:
Room:
8 March 2004
15:30 - 16:30
Armes Building 115, University of Manitoba
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a closed bo
COMPSCI 74.436
UNIVERSITY OF MANITOBA
Winter 2003
COMPUTER SCIENCE
Machine Learning
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a closed book examination.
Use of calculators is permitted.
Show your work to receive full marks.
SURNAME:
COMPSCI 74.436
UNIVERSITY OF MANITOBA
Midterm
Winter 2004
COMPUTER SCIENCE
Machine Learning
Date:
Time:
Room:
9 March 2005
15:30 - 16:30
Armes Building 115, University of Manitoba
(Time allowed: 50 Minutes)
NOTE:
Attempt all questions.
This is a closed bo
Announcement
HW3 is posted.
Due to a department event, the office hour next
Monday is changed to 12:30pm-1:20pm.
Nonparametric regression
Temperature sensing
What is the temperature
in the room?
at location x?
x
Average
Local Average
Kernel Regression
COMP 4360: Machine Learning
Winter 2017
Homework 4: Due Apr 10 at 3:00pm, 25 points in total
General guidelines for homeworks:
You are encouraged to meet with other students to discuss the homework, but all write-ups
must be done on your own. Do not take
Linear Algebra
Vectors
one dimensional array.
If not specified, assume a column
vector.
Matrices
Two dimensional array.
Typically denoted with capital letters.
Transposition
Transposing a matrix swaps columns and rows.
Useful facts about transposition
Dot
Beta prior distribution
Bayesian learning procedure
Step 1: Given a collection of data D= cfw_1 , 2 , , ,
write down the expression for the likelihood
Step 2: specify a prior: ()
Step 3: compute the posterior:
Bayesian learning for thumbtack
compute th
oMCowa is CBWWMH
. . an vohe
Linear Regressnon
COMP4360: Machine Learning , , 500
Housmg Prices >< XX X
(Portland, OR) 400
>8<X XX X X
300 x
. >85): ><><>< ><X><
Price 200
(in 1000s ' V ' *0)
of dollars) 100
0
0 500 1500 2000 2500 3000
Size (feetz)
gu
Why not just use regression?
(Yes) 1 X < X
\
Malignant? ~v~ - _, I
(N0)0 7 A An.
(/J'miorjze mor Size
[whims simts WW3 QMMW
/ 2 3
_ _, @6;
1 5
4 / Logistic Regression
Assumes the following functional form for P(Y | X):
L d1
P(Y QIX) = 1 / 9" lg
exp(w0 +
The lecture notes are a compilation and edition from many sources.
The instructor does not claim intellectual property or ownership of the
lecture notes. Please do not distribute the lecture notes.
Training
Training
Images
Training
Training
Images
Testing
Simple Example: Thumbtack
- P is a Bernoulli distribution
P(X 2W9) = 16
Ll l
Tosses are IID
independent events
._._\_'_
identically distributed according to the same distribution
ha \ Learning Problem
Given toss examples sampled from P
D = cfw_331,5c2
Cross-validation
Data are often limited
Cross-validation creates S groups of data, use S 1
to train, other to validate
Extreme case leave-one-out cross-validation (LOO-CV): S
is the number of training data instances Controlling Over-fitting: Regulariza
Derivative w.r.t. a vector
Given a vector x, and a function f(x), how can
Example
Useful facts
Suppose
and
are vectors,
is a square matrix
Two Viewpoints of Probability
coin is 50%
The probability of raining tomorrow is 10%
(Discrete) Random Variable
A ra
What if data is not linearly separable?
<mgl),.,wm)> - m features
ya; 6 cfw_1,+1 class
Add More Features!
/
33(1)
3'36!)
33(1):,42)
33(1):,38)
e33(1) Illustration
x=(x1,x2) 92) =(x12,xf,\fg1x2
What if data is still not linearly separable?
minimizewb
(
Convex
upper
bound
exp loss
If boosting can make
en
upper bound
training error
0
0/1 loss
1
0
ht
ht
Zt.
ht
Zt.
ht
Zt.
What about test error?
[Schapire, 1989]
Test Error
Training Error
but not always
Weighted average of weak learners
log loss
exp loss
0/1
Soft K-means
In K-means, each data point is forced to be a member of exactly
one cluster. What if we relax this constraint?
Still define a cluster by a centroid, but now we calculate a
centroid as a weighted center of all data points.
Relationship of GM
Dual formulation only depends on
dot-products, not on w!
(x) High-dimensional feature space, but never need it explicitly as long
as we can compute the dot product fast using some Kernel K
Efficient Dot Product of Polynomials
Finally: The Kernel Trick!
N