This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Pattern Recognition
ECE8443 Chapter 5:
Linear Discriminant Functions
(Sections 5.153)
Electrical and Computer Engineering Department,
Mississippi State University. 1 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • Introduction
• Linear Discriminant Functions and Decisions
Surfaces
• Generalized Linear Discriminant Functions 2 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Introduction
• In chapter 3, the underlying probability
densities were known (or given)
• The training sample was used to estimate the
parameters of these probability densities (ML,
MAP estimations)
• In this chapter, we only know the proper
forms for the discriminant functions: similar to
nonparametric techniques
• They may not be optimal, but they are very
simple to use
• They provide us with linear classifiers
3 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Linear discriminant functions and
decisions surfaces
• Definition
It is a function that is a linear combination of the components of x
g(x) = wtx + w0
(1)
where w is the weight vector and w0 the bias • A twocategory classifier with a discriminant function of the
form (1) uses the following rule:
Decide ω1 if g(x) > 0 and ω2 if g(x) < 0
⇔ Decide ω1 if wtx > w0 and ω2 otherwise
If g(x) = 0 ⇒ x is assigned to either class
4 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 5 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The equation g(x) = 0 defines the decision surface
that separates points assigned to the category ω1
from points assigned to the category ω2
– When g(x) is linear, the decision surface is a
hyperplane
– Algebraic measure of the distance from x to the
hyperplane (interesting result!) 6 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 7 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department r .w
x = xp +
w w
(since w is colinear with x  x p and
= 1)
w sin ce g(x) = 0 and w .w = w
t therefore r = 2 g( x )
w w0
in particular d(0, H) =
w – In conclusion, a linear discriminant function divides
the feature space by a hyperplane decision surface
– The orientation of the surface is determined by the
normal vector w and the location of the surface is
determined by the bias
8 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The multicategory case
• We define c linear discriminant functions g i ( x ) = w it x + w i 0 i = 1,..., c and assign x to ωi if gi(x) > gj(x) ∀ j ≠ i; in case of ties, the
classification is undefined
• In this case, the classifier is a “linear machine”
• A linear machine divides the feature space into c decision
regions, with gi(x) being the largest discriminant if x is in the
region Ri
• For a two contiguous regions Ri and Rj; the boundary that
separates them is a portion of hyperplane Hij defined by:
gi(x) = gj(x) ⇔ (wi – wj)tx + (wi0 – wj0) = 0
g −g
• wi – wj is normal to Hij and d ( x , H ij ) = i j
wi − w j 9 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 10 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 11 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – It is easy to show that the decision regions
for a linear machine are convex, this
restriction limits the flexibility and accuracy
of the classifier 12 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Generalized Linear Discriminant
Functions
• Decision boundaries which separate between classes may not
always be linear
• The complexity of the boundaries may sometimes request the
use of highly nonlinear surfaces
• A popular approach to generalize the concept of linear
decision functions is to consider a generalized decision
function as:
g(x) = w1f1(x) + w2f2(x) + … + wNfN(x) + wN+1
(1)
where fi(x), 1 ≤ i ≤ N are scalar functions of the pattern x,
x ∈ Rn (Euclidean Space)
13 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • Introducing fn+1(x) = 1 we get:
g( x ) = N +1 ∑w
i =1 i f i ( x ) = w T .x where w = (w1 , w 2 ,..., w N , w N + 1 )T and x = (f 1 ( x ), f 2 ( x ),..., f N ( x ), f N + 1 ( x ))T • This latter representation of g(x) implies that
any decision function defined by equation (1)
can be treated as linear in the (N + 1)
dimensional space (N + 1 > n)
• g(x) maintains its nonlinearity characteristics in
Rn
14 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • The most commonly used generalized decision function is g(x)
for which fi(x) (1 ≤ i ≤N) are polynomials g ( x ) = ( w )T x Where w is a new weight vector, which can be calculated from the original
w and the original linear fi(x), 1 ≤ i ≤N
• Quadratic decision functions for a 2dimensional feature space
2
2
g ( x ) = w 1 x1 + w 2 x1 x 2 + w 3 x 2 + w 4 x1 + w 5 x 2 + w6
2
2 here : w = (w1 , w 2 ,..., w6 )T and x = (x1 , x1 x 2 , x 2 , x1 , x 2 ,1 )T 15 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • For patterns x ∈Rn, the most general quadratic decision function is
given by:
n n−1 g ( x ) = ∑ w ii x i2 + ∑
i =1 n ∑w i =1 j =i +1 n ij xi x j + ∑ wi xi + wn+ 1 (2) i =1 The number of terms at the righthand side is: n( n − 1 )
( n + 1 )( n + 2 )
l = N +1= n+
+n+1=
2
2
This is the total number of weights which are the free parameters
of the problem – If for example n = 3, the vector x is 10dimensional – If for example n = 10, the vector x is 65dimensional
16 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • In the case of polynomial decision functions of order
m, a typical fi(x) is given by:
f i ( x ) = x ie11 x ie22 ...x iemm
where 1 ≤ i1 , i 2 ,..., im ≤ n and ei ,1 ≤ i ≤ m is 0 or 1.
– It is a polynomial with a degree between 0 and m. To avoid
repetitions, we request i1 ≤ i2 ≤ …≤ im
n n g ( x ) = ∑ ∑ ...
m i1 = 1 i 2 = i1 n w i1i2 ...im x i1 x i2 ...x im + g m − 1 ( x )
∑ i m = i m −1 (where g0(x) = wn+1) is the most general polynomial
decision function of order m
17 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Example 1: Let n = 3 and m = 2 then:
3 g2( x ) = ∑ i1 = 1 3 ∑w i 2 = i1 i1 i 2 x i1 x i 2 + w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 2
2
2
= w11 x1 + w12 x1 x 2 + w13 x1 x 3 + w 22 x 2 + w 23 x 2 x 3 + w 33 x 3 + w1 x1 + w 2 x 2 + w 3 x 3 + w 4
Example 2: Let n = 2 and m = 3 then:
2 g3( x ) = ∑ i1 = 1 2 2 w i1 i 2 i 3 x i1 x i 2 x i 3 + g 2 ( x )
∑ ∑ i 2 = i1 i 3 = i 2 3
2
2
3
= w111 x1 + w112 x1 x 2 + w122 x1 x 2 + w 222 x 2 + g 2 ( x )
2 where g 2 ( x ) = ∑ i1 = 1 2 w i1 i 2 x i1 x i 2 + g 1 ( x )
∑ i 2 = i1 2
2
= w11 x1 + w12 x1 x 2 + w 22 x 2 + w1 x1 + w 2 x 2 + w 3
18 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The commonly used quadratic decision function can be
represented as the general n dimensional quadratic
surface:
g(x) = xTAx + xTb +c
where the matrix A = (aij), the vector b = (b1, b2, …, bn)T
and c, depends on the weights wii, wij, wi of equation (2)
– If A is positive definite then the decision function is a
hyperellipsoid with axes in the directions of the
eigenvectors of A
• In particular: if A = In (Identity), the decision function is simply the
ndimensional hypersphere
19 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • If A is negative definite, the decision function describes a
hyperhyperboloid • In conclusion: it is only the matrix A which determines the
shape and characteristics of the decision function 20 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Problem: Consider a 3 dimensional space and cubic polynomial decision functions
1. How many terms are needed to represent a decision function if only cubic and
linear functions are assumed 2. Present the general 4th order polynomial decision function for a 2 dimensional
pattern space 3. Let R3 be the original pattern space and let the decision function associated
with the pattern classes ω1 and ω2 be:
2
2
g ( x ) = 2 x1 + x 3 + x 2 x 3 + 4 x1 − 2 x 2 + 1 for which g(x) > 0 if x ∈ ω1 and g(x) < 0 if x ∈ ω2
a)
b) 21 Rewrite g(x) as g(x) = xTAx + xTb + c
Determine the class of each of the following pattern vectors:
(1,1,1), (1,10,0), (0,1/2,0)
Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • Positive Definite Matrices
1. A square matrix A is positive definite if xTAx>0 for
all nonzero column vectors x.
2. It is negative definite if xTAx < 0 for all nonzero x.
3. It is positive semidefinite if xTAx ≥ 0.
4. And negative semidefinite if xTAx ≤ 0 for all x.
These definitions are hard to check directly and you
might as well forget them for all practical
purposes. 22 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department More useful in practice are the following properties, which
hold when the matrix A is symmetric and which are easier
to check.
The ith principal minor of A is the matrix Ai formed by the
first i rows and columns of A. So, the first principal minor
of A is the matrix Ai = (a11), the second principal minor is
the matrix: a11 a12 , and so on.
A2 = a a 21 22 23 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The matrix A is positive definite if all its principal
minors A1, A2, …, An have strictly positive
determinants
– If these determinants are nonzero and alternate in
signs, starting with det(A1)<0, then the matrix A is
negative definite
– If the determinants are all nonnegative, then the
matrix is positive semidefinite
– If the determinant alternate in signs, starting with
det(A1)≤0, then the matrix is negative semidefinite
24 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department To fix ideas, consider a 2x2 symmetric matrix: a11 a12 .
A=
a a 21 22 It is positive definite if:
a)
b) It is negative definite if:
a)
b) Chapter 5 det(A1) = a11 ≥ 0
det(A2) = a11a22 – a12a12 ≥ 0 And it is negative semidefinite if:
a)
b) 25 det(A1) = a11 < 0
det(A2) = a11a22 – a12a12 > 0 It is positive semidefinite if:
a)
b) det(A1) = a11 > 0
det(A2) = a11a22 – a12a12 > 0 det(A1) = a11 ≤ 0
det(A2) = a11a22 – a12a12 ≥ 0. Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Exercise 1: Check whether the following matrices are positive
definite, negative definite, positive semidefinite, negative semidefinite or none of the above. 2 1
(a ) A = 1 4 − 2 4
(b) A = 4 −8 − 2 2
(c ) A = 2 − 4 2 4
(d ) A = 4 3 26 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Solutions of Exercise 1:
• ⇒ A is positive definite • A1 = 2
A2 = (2 x –8) –16 = 0 ⇒ A is negative semipositive • A1 =  2
A2 = 8 – 4 = 4 >0 ⇒ A is negative definite • 27 A1 = 2 >0
A2 = 8 – 1 = 7 >0 A1 = 2 >0
A2 = 6 – 16 = 10 <0 ⇒ A is none of the above Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Exercise 2: 2 1
Let A = 1 4 1. Compute the decision boundary assigned to the matrix A (g(x) = xTAx +
xTb + c) in the case where
bT = (1 , 2) and c =  3 2. Solve det(AλI) = 0 and find the shape and the characteristics of the
decision boundary separating two classes ω1 and ω2 3. Classify the following points: 28 Chapter 5 xT = (0 ,  1)
xT = (1 , 1) Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Solution of Exercise 2:
1 2 1 x 1 g(x) = (x1 , x 2 ) 1 4 x + ( x1 , x 2 ) 2 − 3 2 x1 = (2x1 + x 2 , x1 + 4 x 2 ) + x1 + 2 x 2 − 3
x 2 1. 2
2
= 2x1 + x1 x 2 + x1 x 2 + 4 x 2 + x1 + 2 x 2 − 3
2
2
= 2x1 + 4 x 2 + 2 x1 x 2 + x1 + 2 x 2 − 3 2. 1 x 1 2 λ
For λ1 = 3 + 2 using 1 4  λ x = 0 , we obtain : 2 (1  2 ) x1 + x 2 = 0 ⇔ ( −1 − 2 ) x 1 + x 2 = 0 x1 + ( 1 − 2 ) x 2 = 0 This latter equation is a straight line colinear to the vector: 29 T
V1 = ( 1 ,1 + 2 ) Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 1 x 1 2 λ
For λ 2 = 3 + 2 using 1 4  λ x = 0 , we obtain : 2 ( 2 − 1 ) x1 + x 2 = 0 ⇔ ( 2 − 1 ) x1 + x 2 = 0 x1 + ( 1 + 2 ) x 2 = 0 This latter equation is a straight line collinear to the vector: T
V2 = ( 1 ,1 − 2 ) The ellipsis decision boundary has two axes, which are
respectively collinear to the vectors V1 and V2
3. X = (0 , 1) T ⇒ g(0 , 1) = 1 < 0 ⇒ x ∈ ω2
X = (1 , 1) T ⇒ g(1 , 1) = 8 > 0 ⇒ x ∈ ω1 30 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 31 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 32 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department ...
View
Full
Document
This note was uploaded on 02/20/2012 for the course ECE 8443 taught by Professor Staff during the Spring '10 term at University of Houston.
 Spring '10
 Staff

Click to edit the document details