Linear

# Linear - 6 Linear Models A hyperplane in a space H endowed...

This preview shows pages 1–4. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 6 Linear Models A hyperplane in a space H endowed with a dot product Â· Â· is described by the set { x âˆˆ H | w x + b = 0 } (6.1) where w âˆˆ H and b âˆˆ R . Such a hyperplane naturally divides H into two half-spaces: { x âˆˆ H | w x + b â‰¥ } and { x âˆˆ H | w x + b < } , and hence can be used as the decision boundary of a binary classifier. In this chapter we will study a number of algorithms which employ such linear decision boundaries. Although such models look restrictive at first glance, when combined with kernels (Chapter 5 ) they yield a large class of useful algorithms. All the algorithms we will study in this chapter maximize the margin. Given a set X = { x 1 ... x m } , the margin is the distance of the closest point in X to the hyperplane ( 6.1 ). Elementary geometric arguments (Problem 6.1 ) show that the distance of a point x i to a hyperplane is given by | w x i + b | / w , and hence the margin is simply min i =1 ... m | w x i + b | w . (6.2) Note that the parameterization of the hyperplane ( 6.1 ) is not unique; if we multiply both w and b by the same non-zero constant, then we obtain the same hyperplane. One way to resolve this ambiguity is to set min i =1 ...m | w x i + b | = 1 . In this case, the margin simply becomes 1 / w . We postpone justification of margin maximization for later and jump straight ahead to the description of various algorithms. 6.1 Support Vector Classification Consider a binary classification task, where we are given a training set { ( x 1 y 1 ) ... ( x m y m ) } with x i âˆˆ H and y i âˆˆ {Â± 1 } . Our aim is to find a linear decision boundary parameterized by ( w b ) such that w x i + b â‰¥ 159 160 6 Linear Models x 1 w x 2 y i = âˆ’ 1 y i = +1 { x | w x + b = âˆ’ 1 } { x | w x + b = 1 } { x | w x + b = 0 } w x 1 + b = +1 w x 2 + b = âˆ’ 1 w x 1 âˆ’ x 2 = 2 w w x 1 âˆ’ x 2 = 2 w Fig. 6.1. A linearly separable toy binary classification problem of separating the diamonds from the circles. We normalize ( w b ) to ensure that min i =1 ...m | w x i + b | = 1. In this case, the margin is given by 1 w as the calculation in the inset shows. whenever y i = +1 and w x i + b < 0 whenever y i = âˆ’ 1. Furthermore, as dis- cussed above, we fix the scaling of w by requiring min i =1 ...m | w x i + b | = 1. A compact way to write our desiderata is to require y i ( w x i + b ) â‰¥ 1 for all i (also see Figure 6.1 ). The problem of maximizing the margin therefore reduces to max w b 1 w (6.3a) s.t. y i ( w x i + b ) â‰¥ 1 for all i (6.3b) or equivalently min w b 1 2 w 2 (6.4a) s.t. y i ( w x i + b ) â‰¥ 1 for all i. (6.4b) This is a constrained convex optimization problem with a quadratic objec- tive function and linear constraints (see Section 3.3 ). In deriving ( 6.4 ) we implicitly assumed that the data is linearly separable, that is, there is a hyperplane which correctly classifies the training data. Such a classifier is called a hard margin classifier . If the data is not linearly separable, then ( 6.4 ) does not have a solution. To deal with this situation we introduce 6.1 Support Vector Classification6....
View Full Document

{[ snackBarMessage ]}

### Page1 / 32

Linear - 6 Linear Models A hyperplane in a space H endowed...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online