12-knn_perceptron

Datasets w2 w2x biggest 36 overfitting regularization

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: w i ← 2w i (if x i = 1) (promotion ) If f(x) = 0 but w • x ≥ θ , w i ← w i /2 (if x i = 1) (demotion) Learns linear threshold functions 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 38 Algorithm learns monotone functions For the general case: Duplicate variables: To negate variable xi, introduce a new variable xi’=-xi Learn monotone functions over 2n variables Balanced version: Keep two weights for each variable; effective weight is the difference Update Rule : If f ( x) = 1 but ( w + − w − ) • x ≤ θ , wi+ ← 2 wi+ wi− ← If f ( x) = 0 but ( w + − w − ) • x ≥ θ , wi+ ← 2/14/2011 1− wi where xi = 1 (promotion ) 2 1+ wi wi− ← 2 wi− where xi = 1 (demotion) 2 Jure Leskovec, Stanford C246: Mining Massive Datasets 39 • Thick Separator (aka Perceptron with Margin) (Applies both for Perceptron and Winnow) – Promote if: w⋅x=θ w⋅x> θ+γ -- – Demote if: - - -- - -- w⋅x< θ-γ w⋅x=0 - - - Note: γ is a functional margin. Its effect could disappear as w grows. Nevertheless, this has been shown to be a very effective algorithmic addition. 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 40 Examples : x ∈ {0,1} n ; Prediction is 1 iff Hypothesis : w ∈ R n w •x ≥θ Additive weight update algorithm [Perceptron, Rosenblatt, 1958] w ← w + ηi y j x j If Class = 1 but w • x ≤ θ , w i ← w i + 1 (if x i = 1) (promotion ) If Class = 0 but w • x ≥ θ , w i ← w i - 1 (if x i = 1) (demotion) Multiplicative weight update algorithm w ← w ηi exp{yjxj} [Winnow, Littlestone, 1988] If Class = 1 but w • x ≤ θ , w i ← 2w i (if x i = 1) (promotion) If Class = 0 but w • x ≥ θ , w i ← w i /2 (if x i = 1) (demotion) 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 41 • Perceptron • Online: can adjust to changing target, over time • Advantages – Simple – Guaranteed to learn a linearly separable problem • Limitations – only linear separations – only converges for linearly separable data – not really “efficient with many features” 2/14/2011 • Winnow • Online: can adjust to changing target, over time • Advantages – Simple – Guaranteed to learn a linearly separable problem – Suitable for problems with many irrelevant attributes • Limitations – only linear separations – only converges for linearly separable data – not really “efficient with many features” Jure Leskovec, Stanford C246: Mining Massive Datasets 42...
View Full Document

Ask a homework question - tutors are online