Learning Theory
Slides adapted from Yaser Abu-Mostafa
Machine Learning (so far)
Learning is used when:
A pattern exists
We cant pin it down mathematically
We have data on it
Supervised Learning
UNKNOWN target function y = f(x)
Data set:
Learning a

Hypotheses & Decision Trees
Slides adapted from Jeff Storey, Bla Zupan and Ivan Bratko
Classification problem
x2
?
?
?
?
x1
Hypotheses
Build a model that accounts for all the
examples we see
This is called, the hypothesis, h
h is drawn from the hypothe

Deep Learning
Slides adapted from Adam Coates
Artificial Neural Networks
MLPs are a network of simple units
Lead to complex learners
Error backpropagation trains an MLP
Gradient descent applied to entire network
Trains network one layer at a time
2
A

Multi-Layer Perceptrons
& Backpropagation
Slides adapted from Andrew Rosenberg and Etham Alpaydin
1
ANN Recap
Biologically Motivated
Simple units act as function approximators
Simple perceptrons (train w/ gradient descent)
Whats left?
Multi-layer Net

Bias-Variance Trade-Off
Slides adapted from Scott Fortmann-Roe
Last Class
We want Eout(g) 0, but well never know Eout.
Instead, well go for:
This leads to 2 questions
1. Can we make Ein(g) small enough?
All of these algorithms seem to minimize trainin

Logistic Regression
Slides adapted from Michael T. Brannick, Ethem Alpaydin, and Yaser Abu-Mostafa
Our Third Linear Model
hard decision
no decision
soft decision
2
Probability Interpretation
is interpreted as a probability
Example: Prediction of heart at