SPR_LectureHandouts_Chapter_05

SPR_LectureHandouts_Chapter_05 - Pattern Recognition...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Pattern Recognition ECE-8443 Chapter 5: Linear Discriminant Functions (Sections 5.1-5-3) Electrical and Computer Engineering Department, Mississippi State University. 1 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • Introduction • Linear Discriminant Functions and Decisions Surfaces • Generalized Linear Discriminant Functions 2 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Introduction • In chapter 3, the underlying probability densities were known (or given) • The training sample was used to estimate the parameters of these probability densities (ML, MAP estimations) • In this chapter, we only know the proper forms for the discriminant functions: similar to non-parametric techniques • They may not be optimal, but they are very simple to use • They provide us with linear classifiers 3 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Linear discriminant functions and decisions surfaces • Definition It is a function that is a linear combination of the components of x g(x) = wtx + w0 (1) where w is the weight vector and w0 the bias • A two-category classifier with a discriminant function of the form (1) uses the following rule: Decide ω1 if g(x) > 0 and ω2 if g(x) < 0 ⇔ Decide ω1 if wtx > -w0 and ω2 otherwise If g(x) = 0 ⇒ x is assigned to either class 4 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 5 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The equation g(x) = 0 defines the decision surface that separates points assigned to the category ω1 from points assigned to the category ω2 – When g(x) is linear, the decision surface is a hyperplane – Algebraic measure of the distance from x to the hyperplane (interesting result!) 6 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 7 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department r .w x = xp + w w (since w is colinear with x - x p and = 1) w sin ce g(x) = 0 and w .w = w t therefore r = 2 g( x ) w w0 in particular d(0, H) = w – In conclusion, a linear discriminant function divides the feature space by a hyperplane decision surface – The orientation of the surface is determined by the normal vector w and the location of the surface is determined by the bias 8 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The multi-category case • We define c linear discriminant functions g i ( x ) = w it x + w i 0 i = 1,..., c and assign x to ωi if gi(x) > gj(x) ∀ j ≠ i; in case of ties, the classification is undefined • In this case, the classifier is a “linear machine” • A linear machine divides the feature space into c decision regions, with gi(x) being the largest discriminant if x is in the region Ri • For a two contiguous regions Ri and Rj; the boundary that separates them is a portion of hyperplane Hij defined by: gi(x) = gj(x) ⇔ (wi – wj)tx + (wi0 – wj0) = 0 g −g • wi – wj is normal to Hij and d ( x , H ij ) = i j wi − w j 9 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 10 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 11 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – It is easy to show that the decision regions for a linear machine are convex, this restriction limits the flexibility and accuracy of the classifier 12 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Generalized Linear Discriminant Functions • Decision boundaries which separate between classes may not always be linear • The complexity of the boundaries may sometimes request the use of highly non-linear surfaces • A popular approach to generalize the concept of linear decision functions is to consider a generalized decision function as: g(x) = w1f1(x) + w2f2(x) + … + wNfN(x) + wN+1 (1) where fi(x), 1 ≤ i ≤ N are scalar functions of the pattern x, x ∈ Rn (Euclidean Space) 13 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • Introducing fn+1(x) = 1 we get: g( x ) = N +1 ∑w i =1 i f i ( x ) = w T .x where w = (w1 , w 2 ,..., w N , w N + 1 )T and x = (f 1 ( x ), f 2 ( x ),..., f N ( x ), f N + 1 ( x ))T • This latter representation of g(x) implies that any decision function defined by equation (1) can be treated as linear in the (N + 1) dimensional space (N + 1 > n) • g(x) maintains its non-linearity characteristics in Rn 14 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • The most commonly used generalized decision function is g(x) for which fi(x) (1 ≤ i ≤N) are polynomials g ( x ) = ( w )T x Where w is a new weight vector, which can be calculated from the original w and the original linear fi(x), 1 ≤ i ≤N • Quadratic decision functions for a 2-dimensional feature space 2 2 g ( x ) = w 1 x1 + w 2 x1 x 2 + w 3 x 2 + w 4 x1 + w 5 x 2 + w6 2 2 here : w = (w1 , w 2 ,..., w6 )T and x = (x1 , x1 x 2 , x 2 , x1 , x 2 ,1 )T 15 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • For patterns x ∈Rn, the most general quadratic decision function is given by: n n−1 g ( x ) = ∑ w ii x i2 + ∑ i =1 n ∑w i =1 j =i +1 n ij xi x j + ∑ wi xi + wn+ 1 (2) i =1 The number of terms at the right-hand side is: n( n − 1 ) ( n + 1 )( n + 2 ) l = N +1= n+ +n+1= 2 2 This is the total number of weights which are the free parameters of the problem – If for example n = 3, the vector x is 10-dimensional – If for example n = 10, the vector x is 65-dimensional 16 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • In the case of polynomial decision functions of order m, a typical fi(x) is given by: f i ( x ) = x ie11 x ie22 ...x iemm where 1 ≤ i1 , i 2 ,..., im ≤ n and ei ,1 ≤ i ≤ m is 0 or 1. – It is a polynomial with a degree between 0 and m. To avoid repetitions, we request i1 ≤ i2 ≤ …≤ im n n g ( x ) = ∑ ∑ ... m i1 = 1 i 2 = i1 n w i1i2 ...im x i1 x i2 ...x im + g m − 1 ( x ) ∑ i m = i m −1 (where g0(x) = wn+1) is the most general polynomial decision function of order m 17 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Example 1: Let n = 3 and m = 2 then: 3 g2( x ) = ∑ i1 = 1 3 ∑w i 2 = i1 i1 i 2 x i1 x i 2 + w 1 x 1 + w 2 x 2 + w 3 x 3 + w 4 2 2 2 = w11 x1 + w12 x1 x 2 + w13 x1 x 3 + w 22 x 2 + w 23 x 2 x 3 + w 33 x 3 + w1 x1 + w 2 x 2 + w 3 x 3 + w 4 Example 2: Let n = 2 and m = 3 then: 2 g3( x ) = ∑ i1 = 1 2 2 w i1 i 2 i 3 x i1 x i 2 x i 3 + g 2 ( x ) ∑ ∑ i 2 = i1 i 3 = i 2 3 2 2 3 = w111 x1 + w112 x1 x 2 + w122 x1 x 2 + w 222 x 2 + g 2 ( x ) 2 where g 2 ( x ) = ∑ i1 = 1 2 w i1 i 2 x i1 x i 2 + g 1 ( x ) ∑ i 2 = i1 2 2 = w11 x1 + w12 x1 x 2 + w 22 x 2 + w1 x1 + w 2 x 2 + w 3 18 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The commonly used quadratic decision function can be represented as the general n- dimensional quadratic surface: g(x) = xTAx + xTb +c where the matrix A = (aij), the vector b = (b1, b2, …, bn)T and c, depends on the weights wii, wij, wi of equation (2) – If A is positive definite then the decision function is a hyperellipsoid with axes in the directions of the eigenvectors of A • In particular: if A = In (Identity), the decision function is simply the n-dimensional hypersphere 19 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • If A is negative definite, the decision function describes a hyperhyperboloid • In conclusion: it is only the matrix A which determines the shape and characteristics of the decision function 20 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Problem: Consider a 3 dimensional space and cubic polynomial decision functions 1. How many terms are needed to represent a decision function if only cubic and linear functions are assumed 2. Present the general 4th order polynomial decision function for a 2 dimensional pattern space 3. Let R3 be the original pattern space and let the decision function associated with the pattern classes ω1 and ω2 be: 2 2 g ( x ) = 2 x1 + x 3 + x 2 x 3 + 4 x1 − 2 x 2 + 1 for which g(x) > 0 if x ∈ ω1 and g(x) < 0 if x ∈ ω2 a) b) 21 Rewrite g(x) as g(x) = xTAx + xTb + c Determine the class of each of the following pattern vectors: (1,1,1), (1,10,0), (0,1/2,0) Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department • Positive Definite Matrices 1. A square matrix A is positive definite if xTAx>0 for all nonzero column vectors x. 2. It is negative definite if xTAx < 0 for all nonzero x. 3. It is positive semi-definite if xTAx ≥ 0. 4. And negative semi-definite if xTAx ≤ 0 for all x. These definitions are hard to check directly and you might as well forget them for all practical purposes. 22 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department More useful in practice are the following properties, which hold when the matrix A is symmetric and which are easier to check. The ith principal minor of A is the matrix Ai formed by the first i rows and columns of A. So, the first principal minor of A is the matrix Ai = (a11), the second principal minor is the matrix: a11 a12 , and so on. A2 = a a 21 22 23 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department – The matrix A is positive definite if all its principal minors A1, A2, …, An have strictly positive determinants – If these determinants are non-zero and alternate in signs, starting with det(A1)<0, then the matrix A is negative definite – If the determinants are all non-negative, then the matrix is positive semi-definite – If the determinant alternate in signs, starting with det(A1)≤0, then the matrix is negative semi-definite 24 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department To fix ideas, consider a 2x2 symmetric matrix: a11 a12 . A= a a 21 22 It is positive definite if: a) b) It is negative definite if: a) b) Chapter 5 det(A1) = a11 ≥ 0 det(A2) = a11a22 – a12a12 ≥ 0 And it is negative semi-definite if: a) b) 25 det(A1) = a11 < 0 det(A2) = a11a22 – a12a12 > 0 It is positive semi-definite if: a) b) det(A1) = a11 > 0 det(A2) = a11a22 – a12a12 > 0 det(A1) = a11 ≤ 0 det(A2) = a11a22 – a12a12 ≥ 0. Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Exercise 1: Check whether the following matrices are positive definite, negative definite, positive semi-definite, negative semidefinite or none of the above. 2 1 (a ) A = 1 4 − 2 4 (b) A = 4 −8 − 2 2 (c ) A = 2 − 4 2 4 (d ) A = 4 3 26 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Solutions of Exercise 1: • ⇒ A is positive definite • A1 = -2 A2 = (-2 x –8) –16 = 0 ⇒ A is negative semi-positive • A1 = - 2 A2 = 8 – 4 = 4 >0 ⇒ A is negative definite • 27 A1 = 2 >0 A2 = 8 – 1 = 7 >0 A1 = 2 >0 A2 = 6 – 16 = -10 <0 ⇒ A is none of the above Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Exercise 2: 2 1 Let A = 1 4 1. Compute the decision boundary assigned to the matrix A (g(x) = xTAx + xTb + c) in the case where bT = (1 , 2) and c = - 3 2. Solve det(A-λI) = 0 and find the shape and the characteristics of the decision boundary separating two classes ω1 and ω2 3. Classify the following points: 28 Chapter 5 xT = (0 , - 1) xT = (1 , 1) Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Solution of Exercise 2: 1 2 1 x 1 g(x) = (x1 , x 2 ) 1 4 x + ( x1 , x 2 ) 2 − 3 2 x1 = (2x1 + x 2 , x1 + 4 x 2 ) + x1 + 2 x 2 − 3 x 2 1. 2 2 = 2x1 + x1 x 2 + x1 x 2 + 4 x 2 + x1 + 2 x 2 − 3 2 2 = 2x1 + 4 x 2 + 2 x1 x 2 + x1 + 2 x 2 − 3 2. 1 x 1 2 -λ For λ1 = 3 + 2 using 1 4 - λ x = 0 , we obtain : 2 (-1 - 2 ) x1 + x 2 = 0 ⇔ ( −1 − 2 ) x 1 + x 2 = 0 x1 + ( 1 − 2 ) x 2 = 0 This latter equation is a straight line colinear to the vector: 29 T V1 = ( 1 ,1 + 2 ) Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 1 x 1 2 -λ For λ 2 = 3 + 2 using 1 4 - λ x = 0 , we obtain : 2 ( 2 − 1 ) x1 + x 2 = 0 ⇔ ( 2 − 1 ) x1 + x 2 = 0 x1 + ( 1 + 2 ) x 2 = 0 This latter equation is a straight line collinear to the vector: T V2 = ( 1 ,1 − 2 ) The ellipsis decision boundary has two axes, which are respectively collinear to the vectors V1 and V2 3. X = (0 , -1) T ⇒ g(0 , -1) = -1 < 0 ⇒ x ∈ ω2 X = (1 , 1) T ⇒ g(1 , 1) = 8 > 0 ⇒ x ∈ ω1 30 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 31 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department 32 Chapter 5 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department ...
View Full Document

This note was uploaded on 02/20/2012 for the course ECE 8443 taught by Professor Staff during the Spring '10 term at University of Houston.

Ask a homework question - tutors are online