TEMPLATE MATCHING
The Goal: Given a set of reference patterns
known as TEMPLATES, find to which one an
unknown pattern matches best. That is, each
class is represented by a single typical
pattern.
The crucial point is to adopt an appropriate
measure to

v4/25/16
ECE 759 Review
1
v Features: These are measurable quantities obtained
from the patterns, and the classification task is based on
their respective values.
vExample: Mean vector
vFeature vectors:
, lenth vector, textural
A number of features
x1 ,.

v3/28/16
Chapter 10
Clustering Techniques
1
CLUSTERING
vBasic Concepts
In clustering or unsupervised learning no training data, with
class labeling, are available. The goal becomes: Group the
data into a number of sensible clusters (groups). This
unravels

4/13/16
vThe Chameleon algorithm
This algorithm is not based on a static modeling of clusters like CURE
(where each cluster is represented by the same number of
representatives) and ROCK (where constraints are posed through the
function f().
It enjoys b

Optimal Feature Generation
In general, feature generation is a problemdependent task. However, there are a few general
directions common in a number of applications. We
focus on three such alternatives.
Optimized features based on Scatter matrices
(Fish

LINEAR CLASSIFIERS
The Problem: Consider a two class task with 1,
2
T
g ( x) w x w0 0
w1 x1 w2 x2 . wl xl w0
Assume x1 , x 2 on the decision hyperplane :
T
T
0 w x1 w0 w x 2 w0
T
w ( x1 x 2 ) 0 x1 , x 2
1
Hence:
w on the hyperplane
T
g ( x ) w
d
x w0 0

CLUSTERING
Basic Concepts
In clustering or unsupervised learning no training data, with class
labeling, are available. The goal becomes: Group the data into a
number of sensible clusters (groups). This unravels similarities and
differences among the avai

Non Linear Classifiers
The XOR problem
x1
x2
XOR
Class
0
0
0
B
0
1
1
A
1
0
1
A
1
1
0
B
1
There is no single line (hyperplane) that
separates class A from class B.
On the
contrary, AND and OR operations are linearly
separable problems
2
The Two-Layer Pe

CLUSTERING ALGORITHMS
Number of possible clusterings
Let X=cfw_x1,x2,xN.
Question: In how many ways the N points can be
assigned into m groups?
Answer:
m
1
m 1 m N
S ( N , m) ( 1) i
m! i 0
i
Examples:
S (15,3) 2 375 101
S (20,4) 45 232 115 901
S (100,5)

Sergios Theodoridis
Konstantinos Koutroumbas
Version 2
1
PATTERN RECOGNITION
Typical application areas
Machine vision
Character recognition (OCR)
Computer aided diagnosis
Speech recognition
Face recognition
Biometrics
Image Data Base retrieval
Da

CONTEXT DEPENDENT
CLASSIFICATION
Remember: Bayes rule
P (i x) P ( j x), j i
Here: The class to which a feature vector
belongs depends on:
Its own value
The values of the other features
An existing relation among the various
classes
1
This
interrelat

FEATURE SELECTION
The goals:
Select the optimum number l of features
Select the best l features
Large l has a three-fold disadvantage:
High computational demands
Low generalization performance
Poor error estimates
1
Given N
l must be large enough

SYSTEM EVALUATION
The goal is to estimate the error probability of the
designed classification system
Error Counting Technique
Let M classes
i
Let N i data points in class
for testing.
M
N i N of
the
number
1
testi points.
Let Pi the probability err

v2/21/16
FEATURE SELECTION
vThe goals:
Select the optimum number l of features
Select the best l features
vLarge l has a three-fold disadvantage:
High computational demands
Low generalization performance
Poor error estimates
1
Given N
l must be large enou

4/10/16
Remarks:
SL imposes the weakest possible graph condition (connectivity) for the
formation of a cluster, while CL imposes the strongest possible graph
condition (completeness) for the formation of a cluster.
For various choices of h(k), a variet

v2/8/16
Lecture 9
Non-Linear Classiers
1
Non Linear Classiers
vThe XOR problem
x1
x2
XOR
Class
0
0
0
B
0
1
1
A
1
0
1
A
1
1
0
B
2
v1
v2/8/16
vThere is no single line (hyperplane) that separates class
A from class B.
vOn the other hand, AND and OR operation

v1/31/16
vSMALL in the sum of error squares sense means
N
J(w) = (yi wT x i )2
i=1
(yi , x i ) : training pairs that is, the input xi and its
corresponding class label yi (1).
J ( w)
w
N
=
N
T
( yi w xi ) 2 = 0
w i =1
N
( x i x i ) w = x i yi
T
i =1

v1/27/16
Linear Classiers
Chapter 3
1
LINEAR CLASSIFIERS
vThe Problem: Consider a two class task
with 1, 2
T
g ( x) = w x + w0 = 0 =
w1 x1 + w2 x2 + . + wl xl + w0
Assume x1 , x 2 on the decision hyperplane:
0 = wT x1 + w0 = wT x 2 + w0
wT (x1 x 2 ) = 0

v1/12/16
Nondiagonal:
2
T
g ij ( x) = w ( x x 0 ) = 0
w = 1 ( i j )
i j
P(i )
1
x 0 = ( i + j ) n (
)
2
P( j ) 2
i
j 1
x
T
1
1
( x 1 x) 2
not normal to i j
Decision hyperplane
normal to 1 ( i j )
31
vMinimum Distance Classifiers
P (i ) =
1
M
equiproba

v1/13/16
ECE 759
1
PATTERN RECOGNITION
v Typical application areas
Machine vision
Character recognition (OCR)
Computer aided diagnosis
Speech recognition
Face recognition
Biometrics
Image Data Base retrieval
Data mining
Bionformatics
v The task:

v1/24/16
Linear Classiers
Chapter 3
1
LINEAR CLASSIFIERS
vThe Problem: Consider a two class task
with 1, 2
T
g ( x) = w x + w0 = 0 =
w1 x1 + w2 x2 + . + wl xl + w0
Assume x1 , x 2 on the decision hyperplane:
0 = wT x1 + w0 = wT x 2 + w0
wT (x1 x 2 ) = 0

v1/20/16
Variance
The smaller the h the higher the variance
h=0.1, N=1000
h=0.8, N=1000
57
h=0.1, N=10000
The higher the N the better the accuracy
58
v1
v1/20/16
If
h 0
N
h
asymptotically unbiased
The method
Remember:
l12
p ( x 1 )
p( x 2 )
1
N1 h

Introduction
Outline
Machine Perception
An Example
Pattern Recognition Systems
Algorithm Design
Learning and Adaptation
Conclusion
Machine Perception
Build a machine that can recognize patterns:
Speech recognition
Fingerprint identification
OCR (Optic