### Lazy

Course: STATS 315a, Spring 2012
School: Stanford
Weighted Locally Learning Machine Learning Dr. Barbara Hammer Locally Weighted Learning Instance-based Learning ("Lazy Learning") Local Models k-Nearest Neighbor Weighted Average Locally weighted regression Case-based reasoning When to consider Nearest Neighbor Instances map to points in Rn Algorithm? Less then 20 attributes per instance Lots of training data Advantages:...

Weighted Locally Learning Machine Learning Dr. Barbara Hammer Locally Weighted Learning Instance-based Learning ("Lazy Learning") Local Models k-Nearest Neighbor Weighted Average Locally weighted regression Case-based reasoning When to consider Nearest Neighbor Instances map to points in Rn Algorithm? Less then 20 attributes per instance Lots of training data Advantages: Training is very fast Learning complex target functions Don't lose information Slow at query Easily fooled by irrelevant attributes Disadvantages: kNearest Neighbor Algorithm (Classification) Let an arbitrary instances x be described: x={a1(x), a2(x), ..., an(x)} The distance between two instances and is n defined: d (ar ( xi ) - ar ( x j )) 2 r =1 kNearest Neighbor Algorithm Training Algorithm: Store all training examples <x, f(x)> Given a query instance xq to be classified, Classification Algorithm: Let x1, ... xk denote the k instances from the list of training examples k Return ( x ) arg max (v, f ( x )) f q i =1 i (for discrete-valued target function) where (a,b)=1 if a=b and where (a,b)=0 otherwise kNearest Neighbor Examples (discrete-valued target function) k=1 k=5 kNearest Neighbor Examples (real-valued target function) DistanceWeighted Nearest Neighbor Algorithm Idea Might want to weight nearer neighbors more heavily Rationale: Instances closer to xq tend to have target function closer to f(xq) DistanceWeighted Nearest Neighbor Algorithm Distance-weighted function f arg max i ( , f ( xi )) i =1 k where i 1 d ( xq , xi ) 2 weights are proportional to distance; d(xq, xi) is Euclidean distance. special case xq=xi, then f^(xq):= f(xi) NOTE: Now it makes sense to use all training data instead of just k DistanceWeighted Neighbor Nearest Algorithm for real-valued target function: f k i =1 i k i =1 f ( xi ) i where i 1 d ( xq , xi ) 2 Weighting (kernel) function: K(d) -Gaussian kernel K (d ) = e -d 2 DistanceWeighted Nearest Neighbor Examples Locally Weighted Linear Regression Idea: k-NN forms local approximation for each query point xq Why not form an explicit approximation f^(x) for region surrounding xq Fit linear function to k nearest neighbors Fit quadratic, ... Thus producing ``piecewise approximation'' to f Minimize error over k nearest neighbors of xq Minimize error entire set of examples, weighting by distances Combine two above Locally Weighted Linear Regression Local linear function: f^(x)=0+ 1a1(x)+...+ nan(x) Error criterions: Combine E1(xq) and E2(xq) 1 E1 ( xq ) ( nbrsf_ ( x_)x- f ( x))2 2 xk _ nearest _ of q E2 ( xq ) 1 ( f ( x) - f ( x))2 K (d ( xq , x)) 2 xD 1 E3 ( xq ) ( nbrsf_ ( x_)x- f ( x)) 2 K (d ( xq , x)) 2 xk _ nearest _ of q Locally Weighted Linear Regression How it works 1 E3 ( xq ) ( nbrsf_ (ofx_)x- f ( x)) 2 K (d ( xq , x)) 2 xk _ nearest _ q xk _ nearest _ nbrs _ of _ xq wk ( f ( x) - T xi ) min 2 For each point (xk, yk) compute wk Let WX = Diag(w1,w2,...,wn)X Let WY = Diag(w1,w2,...,wn)Y = (WXTWX1)(WXTWY) LWR Example f1 (simple regression) Locally-weighted regression (f2) Locally-weighted regression (f4) Locally-weighted regression (f3) Training data Predicted value using simple regression Predicted value using locally weighted (piece-wise) regression Yike Guo, Advanced Knowledge Management, 2000 References Mitchell, Machine Learning, McGraw-Hill, 1997 Duda,Hart,Storck, Pattern Classification, John Wiley, 2001 Christopher G. Atkeson, Andrew W. Moore, Stefan Schaal, Locally Weighted Learning, 1996
