GEORGIA INSTITUTE OF TECHNOLOGY
SCHOOL of ELECTRICAL and COMPUTER ENGINEERING
ECE 6254 Spring 2010 Problem Set #1
Assigned: January 18, 2010 Due Date: January 26, 2010
Reading Assignment: Chapter 4, Sections 4.1 - 4.4.1 (pp. 129-154). Quote of the Week
Th
The Bayes classifier
Theorem
The classifier
satisfies
where the min is over all possible classifiers.
To calculate the Bayes classifier/Bayes risk, we need to know
Alternatively, since
the maximum
it is sufficient to know
, to find
Linear discriminant ana
Recap
For a single hypothesis, we have
For
hypotheses, and
, we have
or equivalently, that with probability at least
Bound becomes meaningless when
Hoeffdings inequality
choose a fixed
datasets for which
space of all
possible
datasets
Union bound intuitio
The learning challenge
Goal
There is some underlying function
that captures
an input-output relationship which we would like to estimate
Assumption
We do not know , but we get to observe example inputoutput pairs which are generated independently at rando
Linear methods for supervised learning
LDA
Logistic regression
Nave Bayes
PLA
Maximum margin hyperplanes
Soft-margin hyperplanes
Least squares resgression
Ridge regression
Nonlinear feature maps
Sometimes linear methods (in both regression and
classificat
Constrained optimization
A general constrained optimization problem has the form
where
The Lagrangian function is given by
Primal and dual optimization problems
Primal:
Dual:
Weak duality:
Strong duality: For convex problems with affine constraints
Saddle
A first model of learning
Lets restrict our attention to binary classification
our labels belong to
(or
)
We observe the data
where each
Suppose we are given an ensemble of possible hypotheses /
classifiers
From the training data
possible classifier from
A first look at generalization
In these notes, we will get a first look at the theory of generalization
for the binary supervised classification problem.
We observe data (xi, yi) for i = 1, . . . , n, where the xi Rd are
the feature vectors and the yi cfw
ECE 6254
Statistical Machine Learning
Spring 2017
Mark A. Davenport
Georgia Institute of Technology
School of Electrical and Computer Engineering
Statistical machine learning
How can we
learn effective models from data?
apply these models to practical
Linear classifiers
LDA
Logistic regression
PLA
Maximum margin hyperplanes
SVMs
Linear classifiers?
This data set is not linearly
separable
Consider the mapping
The dataset is linearly separable after applying this feature
map:
Fundamental tradeoff
By mapp
GEORGIA INSTITUTE OF TECHNOLOGY
SCHOOL of ELECTRICAL and COMPUTER ENGINEERING
ECE 6254 Spring 2010 Problem Set #1
Assigned: January 18, 2010 Due Date: January 26, 2010
Reading Assignment: Chapter 4, Sections 4.1 - 4.4.1 (pp. 129-154). Quote of the Week
Th
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.432 Stochastic Processes, Detection and Estimation Problem Set 2 Spring 2004 Issued: Tuesday, February 10, 2004 Due: Thursday, February 19, 2004
Reading: For
The Bayes classifier
Consider
where
is a random vector in
is a random variable (depending on
Let
be a classifier
with probability of error/risk given by
The Bayes classifier (denoted
) is the optimal classifier,
i.e., the classifier with smallest possible