lecture4-annotated

lecture4-annotated - Machine Learning 10-701/15-781 Fall...

This preview shows pages 1–7. Sign up to view the full content.

1 © Eric Xing @ CMU, 2006-2008 1 Machine Learning Machine Learning 10 10 -701/15 701/15 -781, Fall 2008 781, Fall 2008 Introduction to Regression Introduction to Regression Eric Xing Eric Xing Lecture 4, September 17, 2008 Reading: Chap. 3, CB © Eric Xing @ CMU, 2006-2008 2 Functional Approximation, cont.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 © Eric Xing @ CMU, 2006-2008 3 Machine learning for apartment hunting z Now you've moved to Pittsburgh!! And you want to find the most reasonably priced apartment satisfying your needs: square-ft., # of bedroom, distance to campus … ? 1.5 270 ? 1 150 500 1 109 1100 2 433 1000 2 506 600 1 230 Rent (\$) # bedroom Living area (ft 2 ) © Eric Xing @ CMU, 2006-2008 4 The learning problem z Features: z Living area, distance to campus, # bedroom … z Denote as x =[ x 1 , x 2 , … x k ] z Target: z Rent z Denoted as y z Training set: rent Living area Living area Location = = k n n n k k n x x x x x x x x x K M M M M K K M M M 2 1 2 2 2 1 2 1 2 1 1 1 2 1 x x x X = n n y y y y y y Y M M M M 2 1 2 1 or
3 © Eric Xing @ CMU, 2006-2008 5 Linear Regression z Assume that Y (target) is a linear function of X (features): z e.g.: z let's assume a vacuous "feature" X 0 =1 (this is the intercept term, why?), and define the feature vector to be: z then we have the following general representation of the linear function: z Our goal is to pick the optimal . How! z We seek that minimize the following cost function : 2 2 1 1 0 ˆ x x y θ + + = = = n i i i i y x y J 1 2 2 1 ) ) ( ˆ ( ) ( v © Eric Xing @ CMU, 2006-2008 6 The Least-Mean-Square (LMS) method z The Cost Function: z Consider a gradient descent algorithm: = = n i i T i y J 1 2 2 1 ) ( ) ( x t j t j t j J ) ( α = + 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 © Eric Xing @ CMU, 2006-2008 7 The Least-Mean-Square (LMS) method z Now we have the following descent rule: z For a single training point, we have: z This is known as the LMS update rule, or the Widrow-Hoff learning rule z This is actually a " stochastic ", " coordinate " descent algorithm z This can be used as a on-line algorithm = + + = n i j i t T i i t j t j x y 1 1 ) ( θ α x v © Eric Xing @ CMU, 2006-2008 8 Geometric and Convergence of LMS N=3 N=2 N=1 Claim: when the step size satisfies certain condition, and when certain other technical conditions are satisfied, LMS will converge to an “optimal region”. i t T i i t t y x x v v ) ( 1 + = +
5 © Eric Xing @ CMU, 2006-2008 9 Solution path © Eric Xing @ CMU, 2006-2008 10 Steepest Descent and LMS z Steepest descent z Note that: z This is as a batch gradient descent algorithm = = = n i n T n n T k y J J J 1 1 x x ) ( , , θ K = + + = n i n t T n n t t y 1 1 x x ) ( α

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 © Eric Xing @ CMU, 2006-2008 11 The normal equations z Write the cost function in matrix form: z To minimize J( θ ), take derivative and set to zero: () y y X y y X X X y X y X y J T T T T T T T n i i T i v v v v v v + = = = = θ 2 1 2 1 2 1 1 2 ) ( ) ( x
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 23

lecture4-annotated - Machine Learning 10-701/15-781 Fall...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online