Section 6.5 Example: XOR Problem (Revisited) 335
6.5 EXAMPLE: XOR PROBLEM (REVISITED)
To illustrate the procedure for the design of a support vector machine, we revisit the
XOR (Exclusive OR) problem discussed in Chapters 4 and 5. Table 6.2 presents a sum
Lecture 2: The SVM classifier
C19 Machine Learning
Hilary 2015
Review of linear classifiers
Linear separability
Perceptron
Support Vector Machine (SVM) classifier
Wide margin
Cost function
Slack variables
Loss functions revisited
Optimization
A. Zisse
An Experimental Comparison of
Symbolic and Connectionist Learning Algorithms
Jude Shavlik
Computer Sciences
University of Wisconsin
Madison, WI 53706
Raymond Mooney
Computer Sciences
University of Texas
Austin, TX 78712
Abstract
)espite the fact that many
The effect of vigilance parameter on ART performance
Principal Components Analysis
(PCA)
Transfer the dataset to the center by subtracting
the means: let matrix A be the result.
Compute the covariance matrix ATA.
Project the dataset along a subset of t
A Tutorial on Support Vector
Regression
Alex J. Smola, GMD 1
Bernhard Scholkopf, GMD 2
NeuroCOLT2 Technical Report Series
NC2-TR-1998-030
October, 19983
Produced as part of the ESPRIT Working Group
in Neural and Computational Learning II,
NeuroCOLT2 27150
Vector Quantization
Figure shows vectors in two dimensional space.
Associated with each cluster of vectors is a
representative codeword.
Each codeword resides in its own Voronoi region.
These regions are separated with imaginary lines in
figure for il
1. Give decision tree to represent the following Boolean function A V (B C)
2. Consider the following set of training examples:
(a) What is the entropy of this collection of training examples with respect to the target
function classification?
(b) What is
1. For an SVM, if we remove one of the support vectors from the training set, does the size
of the maximum margin decrease, stay the same, or increase for that data set?
2. In a linear SVM, s for the support vectors are given as 1, 0.6 and 0.4. What is th
CS229 Lecture notes
Andrew Ng
Part V
Support Vector Machines
This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
o-the-shelf supervised learning algorithm. To tell
Q-Learning
Lets say we have access to the optimal value function that computes the
total future discounted reward V * (s )
What would be the optimal policy (s )
*
?
Answer: we choose the action that maximizes:
* (s) argmax r (s,a) V * ( (s,a)
a
We as
ADAPTIVE RESONANCE
THEORY
Grossberg, 1976
Stability-Plasticity Dilemma
How can learning continue into adulthood without
causing catastrophic forgetting?
How can we learn quickly without being forced to forget
just as quickly?
STABILITY-PLASTICITY
DILEMMA
World Poverty Map
pos=hextop(9,13);
plotsom(pos)
Here the data consisted of World Bank statistics of countries in 1992. Altogether
39 indicators describing various quality-of-life factors, such as state of health,
nutrition, educational services, etc, wer
Introduction
Unsupervised learning
Training samples contain only input patterns
No desired output is given (teacher-less)
Learn to form classes/clusters of sample
patterns according to similarities among them
Patterns in a cluster would have similar
Recycling Robot MDP
R search expected no. of cans while searching
S high ,low
A( high ) search ,wait
A( low ) search ,wait,recharge
Reinforcement Learning
R wait expected no. of cans while waiting
R search R wait
3
Transition Table
Reinforcement Learning
The learner and decision-maker is called the agent.
The thing it interacts with, comprising everything outside
the agent, is called the environment.
These interact continually, the agent selecting actions and
the environment responding to those actions
Reinforcement Learning: How Does It Work?
We detect a state
We choose an action
We get a reward
Our aim is to learn a policy what action to choose in
what state to get maximum reward
Maximum reward over the long term, not necessarily
immediate maximum rew
Reinforcement Learning
Mainly based on Reinforcement
Learning An Introduction by
Richard Sutton and Andrew Barto
http:/www.cs.ualberta.ca/~sutton/book/the-book.html
http:/webdocs.cs.ualberta.ca/~sutton/book/the-book.html
1
Learning from Experience Plays a
Error due to Bias: The error due to bias is taken as the difference between the
expected (or average) prediction of our model and the correct value which we are
trying to predict. If model building is repeated many times each time you gather
new data and
Kernels
The kernel matrix is symmetric positive definite. Any symmetric
positive definite matrix can be regarded as a kernel matrix, that is as
an inner product matrix in some space.
In mapping in a space with too many irrelevant
features, kernel matrix
SVM:Nonlinear case
Model selection procedure
-We have to decide which Kernel function and C value to use.
-In practice a Gaussian radial basis or a low degree polynomial kernel
is a good start. [Andrew.Moore]
- We start checking which set of parameters (s
Support Vector Machines
Which Separating Hyperplane to Use?
Var1
6
Maximizing the Margin
Var1
Select the
separating
hyperplane
that
maximizes the
margin!
Margin
Width
Margin
Width
Var2
7
Support Vectors
Var1
Support
Vectors
Margin
Width
Var2
8
Example
(0,
Non-Linearly Separable Data
The margin of separation is soft if the condition is violated
d i ( wxi b) 1
d i ( wxi b ) 1 i
Non-Linearly Separable Data
Var1
Introduce slack
variables i
i
i
r r
w x b 1
w
r r
w x b
1
Allow some
instances to fall
within the
Naive Bayes Classifier
Let each instance x of a training set D be described by a conjunction
of n attribute values <a1,a2,.,an> and let f(x), the target function, be
such that f(x) V, a finite set.
Bayesian Approach:
vMAP = argmaxvj V P(vj|a1,a2,.,an)
=
The Dual of the SVM Formulation
Original SVM formulation
n inequality constraints
n positivity constraints
n number of variables
The dual of this problem
n positivity constraints
one equality constraint
n number of variables
(Lagrange multipliers)
Lagrange Multiplier
Method
It's milking time at the farm, and
the milkmaid has been sent to the
field to get the day's milk. She's in
a hurry to get back for a date with
a handsome young goatherd, so
she wants to finish her job as
quickly as possible. How
Standard Problem
The rule is that for constraints of the form gi 0, the constraint
equations are multiplied by positive Lagrange multipliers and
subtracted from the objective function, to form the Lagrangian.
Kuhn-Tucker conditions
Kuhn-Tucker conditions:
Iris Problem
Type load fisheriris in MATLAB
Length and width of sepal and petal for three Northern
American species of iris. Fishers Iris data set contains 3
classes of 50 instances each, where each class refers to a
type of iris plant
1. sepal length in
Bayesian classifiers
unseen sample X = <rain, hot, high, weak>
P(p|a1,a2,.,an) = a
P(vj|a1,a2,.,an)
P(n|a1,a2,.,an) =b
P(h | D) P(D | h)P(h)
P(D)
If a> b, then the class of X is p otherwise n
Bayes Theorem
A = No. of people with cancer
P(A)= the probabili