CSE 250: Machine Learning Theory
Spring 2016
Bonus Quiz
Due on: Wed May 11
Instructor: Raef Bassily
Instructions
For your proofs, you may use any result covered in class, and its analysis, but please cite the result
that you use.
General Bound on PAC-lea

CSE 250: Machine Learning Theory
Spring 2016
Homework 1
Due on: Wed April 20
Instructor: Raef Bassily
Instructions and Notes
For your proofs, you may use any result covered in class, and its analysis, but please cite the result that
you use.
All problem

CSE 250: Machine Learning Theory
Spring 2016
Part 3
Instructor: Raef Bassily
3.1
Scribe: Andrew Leverentz
Vapnik-Chervonenkis (VC) Dimension, continued
Definition 3.1 (Set Shattering). A hypothesis class H shatters a finite set Tn = cfw_x1 , . . . , xn X

CSE 250: Machine Learning Theory
Spring 2016
Homework 3 (Project)
Instructor: Raef Bassily
Due on: Wed June 1
Please, read the following instructions carefully
The homework is a mini-project where it is required to implement and test the SGD algorithm fo

CSE 250: Machine Learning Theory
Spring 2016
Part 2 - The PAC Model
Instructor: Raef Bassily
2.1
Scribe: Andrew Leverentz
Statistical Learning Framework
The basic setup for statistical learning:
Inputs Learner A Outputs
1. Inputs to A
Domain set X (featu

CSE 250: Machine Learning Theory
Spring 2016
Mid-term Exam
Date: May 9, 2016
Instructor: Raef Bassily
Instructions
Maximum Grade = 20 points. Totat points = 28 points.
Prove all your claims, you may use any result covered in class, but you must cite the

CSE 250: Machine Learning Theory
Spring 2016
Homework 2
Due on: Wed May 4
Instructor: Raef Bassily
Instructions and Notes
For your proofs, you may use any result covered in class, and its analysis, but please cite the result
that you use.
All problems h

CSE 250: Machine Learning Theory
Spring 2016
Part 5 - Introduction to Convex Learning
Instructor: Raef Bassily
5.1
Scribe: Andrew Leverentz
Convex Learning
Now we will discuss some generalizations to our learning framework. As before, X is the domain
set

CSE 250: Machine Learning Theory
Spring 2016
Part 4
Instructor: Raef Bassily
4.1
Scribe: Andrew Leverentz
Weak versus Strong Learnability
Definition 4.1 (-weak learner). Suppose be some small number bounded away from 1/2, say
(0, 14 ]. An algorithm A is

CSE 250: Machine Learning Theory
Spring 2016
Homework 2
Instructor: Raef Bassily
Due on: Wed May 4
Instructions and Notes
For your proofs, you may use any result covered in class, and its analysis, but please cite the result
that you use.
All problems h

Euclidean projection
Definition: (Euclidean Projection)
C ! d be a closed convex set. The Euclidean projection
C : ! d C is defined as:
Let
v ! d , C (v) = arg min wC v w
v
That is, C (v) is the closest
point in
C
C (v)
(w.r.t. the Euclidean
distance) t

CSE 250: Machine Learning Theory
Spring 2016
Lecture 8
Instructor: Raef Bassily
8.1
Scribe: Andrew Leverentz
Sauers Lemma
Lemma 8.1 (Sauers Lemma). If VC(H) = k, then for any T X of size n, we have
|CH (T )|
k
X
n
i
en k
i=0
for n > k
k
= O(nk )
for n

CSE 250: Machine Learning Theory
Spring 2016
Lecture 16
Instructor: Raef Bassily
16.1
Scribe: Andrew Leverentz
Stability of Regularized Loss Minimization (RLM)
The stability of RLM relies on a property called strong convexity.
Definition 16.1 (Strongly co

CSE 250: Machine Learning Theory
Spring 2016
Homework 1
Instructor: Raef Bassily
Due on: Wed April 20
Instructions and Notes
For your proofs, you may use any result covered in class, and its analysis, but please cite the result that
you use.
All problem

CSE 250: Machine Learning Theory
Spring 2016
Mid-term Exam
Instructor: Raef Bassily
Date: May 9, 2016
Instructions
Maximum Grade = 20 points. Totat points = 28 points.
Prove all your claims, you may use any result covered in class, but you must cite the

CSE 250: Machine Learning Theory
Spring 2016
Lecture 10
Instructor: Raef Bassily
10.1
Scribe: Andrew Leverentz
Adaboost, continued
Example 10.1. Suppose S R2 and H = horizontal or vertical half-planes (a.k.a. decision
stumps).
Sample from D(1) = uniform

Convex functions contd
Composition of convex functions:
f1 : ! d ! and f2 : ! ! be convex functions. Then, the
composition f2 ! f1 defined as f2 ! f1 (w) = f2 ( f1 ( w ) is convex if
Let
either one of the following conditions holds:
f1
f2
is affine (i.e.,

CSE 250: Machine Learning Theory
Spring 2016
Lecture 5
Instructor: Raef Bassily
Scribe: Andrew Leverentz
Example 5.1 (Learning conjunctions; not restricted to monotone). There is a simple reduction:
note that a conjunction over boolean variables x1 , . .

CSE 250: Machine Learning Theory
Spring 2016
Lecture 17
Instructor: Raef Bassily
17.1
Scribe: Andrew Leverentz
RLM analysis of excess risk for convex Lipschitz loss (continued)
Last time, we showed that the stability rate of RLM for convex Lipschitz loss

CSE 250: Machine Learning Theory
Spring 2016
Lecture 9
Instructor: Raef Bassily
9.1
Scribe: Andrew Leverentz
Weak versus Strong Learnability
Definition 9.1 (-weak learner). Suppose be some small number bounded away from 1/2, say
(0, 14 ]. An algorithm A

CSE 250: Machine Learning Theory
Spring 2016
Lecture 2
Instructor: Raef Bassily
Scribe: Andrew Leverentz
Theorem 2.1 (Hoeffdings inequality: general case). Let X1 , . . . , Xm be a sequenceP
of r.v.s such
m
1
that for all i, ai Xi bi with probability 1 (o

CSE 250: Machine Learning Theory
Spring 2016
Lecture 12
Instructor: Raef Bassily
12.1
Scribe: Andrew Leverentz
Convex Learning, continued
Recall the setup for regression:
X Rm ,
Y R,
H = cfw_hw : X Y, where each hw is parameterized by w C Rd .
The loss fu

Other topics for future consideration
& Concluding remarks
More on Linear Classifiers:
Perceptron & SVMs
More into Linear Classifiers
Linearly separable case (i.e., realizability holds):
(1)
(1)
(n)
(n)
m
(x
,
y
),(x
,
y
)
!
(
) cfw_1,+1
! = (w, b) " m "

CSE 250: Machine Learning Theory
Spring 2016
Lecture 6
Instructor: Raef Bassily
Scribe: Andrew Leverentz
There are two challenges we still need to address:
1. Realizability: Weve been sweeping the realizability assumption under the rug; well need to
retur

CSE 250: Machine Learning Theory
Spring 2016
Lecture 4
Instructor: Raef Bassily
Scribe: Andrew Leverentz
Example 4.1 (Accuracy/confidence analysis for axis-aligned rectangles). First, fix 0 < and
0 < < 1.
What is the true error? It is based on the region

CSE 250: Machine Learning Theory
Spring 2016
Lecture 15
Instructor: Raef Bassily
Scribe: Andrew Leverentz
Before today, most of our learning paradigms were based on empirical risk minimization (ERM)
or boosting (amplifying performance of weak learners).

CSE 250C/SPRING 2016
MACHINE LEARNING THEORY
QUIZ
Problem 1: Consider a square board with unit surface area. The board has a target
zone, which is a small square at the center of surface area (see Figure 1). Suppose
that darts are thrown independently and

CSE 250: Machine Learning Theory
Spring 2016
Lecture 7
Instructor: Raef Bassily
7.1
Scribe: Andrew Leverentz
Vapnik-Chervonenkis (VC) Dimension, continued
Definition 7.1 (Set Shattering). A hypothesis class H shatters a finite set Tn = cfw_x1 , . . . , xn