12-knn_perceptron

Massive datasets 27 natural question how to speed up

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: x of word occurrences d features (words + other things, d~100,000) Class Y: y: Spam (+1), Ham (-1) 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 30 Binary classification: f (x) = { 1 0 if w1 x1 + w2 x2 +. . . wn xn ≥ θ otherwise Input: Vectors xi and labels yi Goal: Find vector w = (w1, w2 ,... , wn) Each wi is a real number w⋅x=θ w⋅x=0 2/14/2011 -- - - -- - -- Jure Leskovec, Stanford C246: Mining Massive Datasets Note: - x ⇔ x, 1 ∀x w ⇔ w, − θ 31 (very) Loose motivation: Neuron Inputs are feature values Each feature has a weight wi w1 x1 Activation is the sum: w2 x2 w3 x3 w4 x4 f(x) = Σi wi⋅xi = w⋅x - θ If the f(x) is: Positive: predict +1 Negative: predict -1 w⋅x=0 x2 nigeria ∑ x1 w Ham=-1 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets >0? Spam=1 32 Perceptron: y’ = sign(w⋅x) How to find parameters w? Start with w0 = 0 Pick training examples x one by one (from disk) Predict class of x using current weights y’ = sign(wt ⋅ x) If y’ is correct (i.e., y=y’): No change: wt+1 = wt If y’ is wrong: adjust w wt+1 = wt + η ⋅ y ⋅ x where: η⋅y⋅x wt wt+1 x η is the learning rate parameter x is the training example y is true class label ({+1, -1}) 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 33 Perceptron Convergence Theorem: If there exist a set of weights that are consistent (i.e., the data is linearly separable) the perceptron learning algorithm will converge How long would it take to converge? Perceptron Cycling Theorem: If the training data is not linearly separable the perceptron learning algorithm will eventually repeat the same set of weights and therefore enter an infinite loop How to provide robustness, more expressivity? 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 34 Separability: some parameters get training set perfectly Convergence: if training set is separable, perceptron will converge (binary case) Mistake bound: number of mistakes < 1/γ2 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 35 If more than 2 classes: Weight vector wc for each class Calculate activation for each class f(x,c)= Σi wc,i⋅xi = wc⋅x Highest activation wins: c = arg maxc f(x,c) w3⋅x biggest w3 w1 w1⋅x biggest 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets w2 w2⋅x biggest 36 Overfitting: Regularization: if the data is not separable weights dance around Mediocre generalization: Finds a “barely” separating solution 2/14/2011 Jure Leskovec, Stanford C246: Mining Massive Datasets 37 Winnow algorithm Similar to perceptron, just different updates Initialize : θ = n; w i = 1 Prediction is 1 iff w •x ≥θ If no mistake : do nothing If f(x) = 1 but w • x < θ ,...
View Full Document

This document was uploaded on 02/26/2014 for the course CS 246 at Stanford.

Ask a homework question - tutors are online