lec1-linear models - Machine Learning Lecture 2 Yang Yang...

Info icon This preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
Machine Learning Lecture 2 Yang Yang Department of Computer Science & Engineering Shanghai Jiao Tong University Reading list: Andrew’s Lecure note 1, 《机器学习》第三章,PRML 3.1
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Linear Regression
Image of page 2
Machine Learning for house hunting Suppose we have a dataset giving the living areas and prices of some houses : Living area (feet^2) Price(1000$s) 2014 400 1600 330 2400 369 1416 232 3000 540 2005 ? 3200 ? 1280 ? We can plot this data set: How can we learn to predict the prices of other houses, as a function of the size of their living areas?
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
The learning problem denotes the “input” variables/features denotes the “output” or target variable that we are trying to predict (price) A pair ( , ) is called a training example A list of m training examples {( , ); i = 1, …, m}—is called a training set. X denote the space of input values, and Y the space of output values. In this example, X = Y = R. Our goal: Given a training set, learn a function h : X —>Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis.
Image of page 4
A slightly richer dataset If you want to find the most reasonably priced house satisfying your needs: square‐ft, # of bedroom, distance to work place... Living area (feet^2) # bedrooms Price(1000$s) 2014 3 400 1600 3 330 2400 3 369 1416 2 232 3000 4 540 2005 3 ? 3200 4 ? 1280 2 ?
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
The learning problem Features: Living area, #bedroom, distance to work place Denote as x = Target: Price Denoted as y Training set: m: #examples/samples n: #features
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Linear Regression Assume that Y (target) is a linear function of X (features): Here, the ’s are the parameters (also called weights) parameterizing the space of linear functions mapping from tto . When there is no risk of confusion, we will drop the subscript in and write it more simply as . To simplify our notation, we also introduce the convention of letting x 0 = 1 (this is the intercept term), so that Pre-processing of features or feature extraction
Image of page 8
Linear Basis Function Models (1) Example: Polynomial Curve Fitting
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Linear Basis Function Models (2) Polynomial basis functions: These are global; a small change in x affects all basis functions.
Image of page 10
Linear Basis Function Models (4) Sigmoidal basis functions: Also these are local; a small change in x only affect nearby basis functions. µ j and s control location and scale (slope). Where
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
The Least Mean Square (LMS) method The Cost Function: Consider a gradient descent algorithm:
Image of page 12
The Least Mean Square (LMS) method For a single training example, this gives the update rule: This is known as the LMS update rule, or the Widrow‐Hoff learning rule If the training set has more than one example Batch gradient descent
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Stochastic gradient descent The above results were obtained with batch gradient descent. There is an alternative to batch gradient descent that also works very well.
Image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern