13pattern_qp_art

13pattern_qp_art - Pattern Classication and Quadratic...

This preview shows pages 1–6. Sign up to view the full content.

Pattern Classiﬁcation, and Quadratic Problems (Robert M. Freund) March 30, 2004 c ± 2004 Massachusetts Institute of Technology. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1 Overview Pattern Classiﬁcation, Linear Classiﬁers, and Quadratic Optimization Constructing the Dual of CQP The Karush-Kuhn-Tucker Conditions for CQP Insights from Duality and the KKT Conditions Pattern Classiﬁcation without strict Linear Separation 2 Pattern Classiﬁcation, Linear Classiﬁers, and Quadratic Optimization 2.1 The Pattern Classiﬁcation Problem We are given: points a 1 ,...,a k ∈± n that have property “P” points b 1 ,...,b m n that do not have property “P” would like to use these k + m points to develop a linear rule that can be used to predict whether or not other points x might or might not have property P. In particular, we seek a vector v and a scalar β for which: v T a i for all i =1 ,...,k v T b i for all i ,...,m will then use v, β to predict whether or not other points c have property P or not, using the rule: If v T c>β , then we declare that c has property If v T c<β , then we declare that c does not have property 2
We therefore seek v, β that deﬁnes the hyperplane H v,β := { x | v T x = β } for which: v T a i for all i =1 ,...,k v T b i for all i ,...,m This is illustrated in Figure 1. Figure 1: Illustration of the pattern classiﬁcation problem. 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
± ² ± ² 2.2 The Maximal Separation Model We seek v, β that deﬁnes the hyperplane H v,β := { x | v T x = β } for which: v T a i for all i =1 ,...,k v T b i for all i ,...,m would like the hyperplane H v,β not only to separate the points with m diﬀerent properties, but to be as far away from the points a 1 ,...,a k ,b 1 ,...,b as possible. It is easy to derive via elementary analysis that the distance from the hyperplane H v,β to any point a i is equal to v T a i β . ± v ± Similarly, the distance from the hyperplane H v,β to any point b i is equal to T b i β v . ± v ± If we normalize the vector v so that ± v ± , then the minimum distance from the hyperplane H v,β to any of the points a 1 k 1 m is then: T T b 1 m min v a 1 β,. ..,v T a k β, β v ,...,β v T b . therefore would like v and β to satisfy: • ± v ± =1, and min v T a 1 T a k v T b 1 v T b m is maximized. 4
This yields the following optimization model: PCP : maximize v,β,δ δ s.t. v T a i β δ, i =1 ,...,k T b i β v δ, i ,...,m ± v ± , v ∈² n Now notice that PCP is not a convex optimization problem, due to the presence of the constraint ± v ± = 1”. 2.3 Convex Reformulation of PCP To obtain a convex optimization problem equivalent to PCP, we perform the following transformation of variables: v β x = = . δ δ 1 Then notice that δ = ± v ± = ± x ± , and so maximizing δ is equivalent to ± x ± 1 maximizing ± x ± , which is equivalent to minimizing ± x ± . This yields the following reformulation of PCP: minimize x,α ± x ± s.t.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/04/2011 for the course ESD 15.094 taught by Professor Jiesun during the Spring '04 term at MIT.

Page1 / 30

13pattern_qp_art - Pattern Classication and Quadratic...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online