CSEP 546 Data Mining/Machine Learning, Winter 2014:
Homework 1
Due: Monday, January 20th , beginning of class
1
Simpsons Paradox [14 points]
Imagine you and your friend, playing the slot machine in a casino. Having played on two separate machines
for a wh
CSE546 Machine Learning, Autumn 2016: Homework 1
Due: Friday, October 14th , 5pm
0
Policies
Writing: Please submit your HW as a typed pdf document (not handwritten). It is encouraged you latex
all your work, though you may use another comparable typesetti
CSEP 546 Data Mining/Machine Learning, Winter 2014:
Homework 4
Due: Monday, March 3rd , beginning of class
1
Recommender systems for fun and prot [40 points]
For this problem, you will explore Matrix completion to predict movie ratings.
1.1
Matrix factori
CSEP 546 Data Mining/Machine Learning, Winter 2014:
Homework 3
Due: Monday, February 17th , beginning of class
1
Na Bayes[28 points]
ve
The following table contains data from an employee database. The database includes the status, department,
age range an
V.I. Inspections
22:05 February 20, 2017
Page 1 of 22
8261 45th Ave NE
Table of Contents
Definitions
2
General Information
2
Lots and Grounds
3
Exterior Surface and Components
4
Roof
5
Garage/Carport
6
Electrical
6
Structure
7
Basement
8
Crawl Space
8
Fir
CSE 546: Machine Learning
Lecture 5 extras
Optional Reading: Large deviations and the 2 tail bound
Instructor: Sham Kakade
1
The Central Limit Theorem
While true under more general conditions, the following is a rather simple proof of the central limit th
CSE 546: Machine Learning
Lecture 2
Least Squares
Instructor: Sham Kakade
1
Supervised Learning and Regression
We observe data:
T = (x1 , y1 ), . . . (xn , yn )
from some distribution. Our goal may be to predict the Y give some X. If Y is real, we may wis
CSE 546: Machine Learning
Lecture 6
Feature Selection: Part 2
Instructor: Sham Kakade
1
Greedy Algorithms (continued from the last lecture)
There are variety of greedy algorithms and numerous naming conventions for these algorithms. These algorithms must
CSE 546: Machine Learning
Lecture 18
Concentration and ERM
Instructor: Sham Kakade
1
Chernoff and Hoeffding Bounds
Theorem 1.1. Let Z1 , Z2 , . . . Zm be m i.i.d. random variables with Zi [a, b] (with probability one). Then for all
> 0 we have:
!
m
2
1X
CSE 546: Machine Learning
Lecture 9
Optimization 1: Gradient Descent
Instructor: Sham Kakade
1
Gradient Descent and Stochastic Gradient Descent
Suppose we want to solve:
min G(w)
w
In many machine learning problems, we have that G(w) is of the form:
G(w)
CSE 546: Machine Learning
Lecture 8
Convexity and Gradient Descent
Instructor: Sham Kakade
1
Introduction: learning and generalization
One of the standard settings is that we have a loss function `(f (x), y) where x is an input, y is an output, and f (x)
CSE 546: Machine Learning
Lecture 10
Stochastic Gradient Descent
Instructor: Sham Kakade
1
Non-smooth optimization and (sub-)gradient descent
The the sub-gradient update rule is again:
wt+1 = wt G(wt )
where G(wt ) is the sub-gradient at wt .
We say that
CSE 546: Machine Learning
Lecture 5
Feature Selection: Part 1
Instructor: Sham Kakade
1
Regression in the high dimensional setting
How do we learn when the number of features d is greater than the sample size n? In the previous lecture, we examined
ridge
CSE 546: Machine Learning
Lecture 11
SGD and Generalization
Instructor: Sham Kakade
1
Stochastic Gradient Descent
Suppose we want to minimize G(w), where G(w) is of the form:
G(w) =
1X
`(xi , yi ), w)
n i
We could use gradient descent. One practical diffi
CSE 546: Machine Learning
Lecture 7
Feature construction: Boosting, Kernels, and Random Features.
Instructor: Sham Kakade
1
Boosted Decision Trees
The input is a training set:
cfw_(x1 , y1 ), . . . (xn , yn )
1. Initialize: set the residual ri = yi for al
Support Vector Machines Preview
What is a support vector machine?
The perceptron revisited
Kernels
Weight Optimization
Handling noisy data, What Is a Support Vector Machine?
1. A subset of the training examples X
(the support vectors)
2. A vector of weigh
Clustering and
Dimensionality Reduction Preview
0 Clustering
K-means clustering
Mixture models
Hierarchical clustering
o Dimensionality reduction
Principal component analysis
Multidimensional scaling
Isomap Unsupervised Learning
Problem: Too much da
CSE 546: Machine Learning
Lecture 3
Bias-Variance Tradeoff and Dimension-Free Regression
Instructor: Sham Kakade
1
Risk, in the well specified case
Suppose now that the linear model is correct. In particular, assume that:
Y = w> X +
where N (0, 2 ) and w
CSE 546: Machine Learning
Lecture 3
Risk of Ridge Regression
Instructor: Sham Kakade
0.1
Analysis
Let us rotate each Xi by V > , i.e.
Xi V > Xi
where V is the right matrix of the SVD of the n d matrix X (note this rotation does not alter the predictions o
CSE 546: Machine Learning
Lecture 1
Overview / Maximum Likelihood Estimation
Instructor: Sham Kakade
1
What is Machine Learning?
Machine learning is the study of algorithms which improve their performance with experience. The area combines
ideas from both