ml-lecture02

ml-lecture02 - Lecture 2 Classifcation Perceptron Sigmoid...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 2: Classifcation. Perceptron. Sigmoid classifers. • Classifcation problems. Error Functions • Perceptron • Sigmoid classifers September 10, 2007 1 COMP-652 Lecture 2 Classifcation • Given a data set D ⊂ X × Y where Y is a discrete set (usually with a smallish number oF values), fnd a hypothesis h ∈ H which predicts “well” the existing data • IF Y has two possible values, e.g. Y = {- 1 , 1 } or Y = { , 1 } , this is called binary classifcation. • Can we develop methods For classifcation as we did For regression? • What does it take to develop a learning algorithm? September 10, 2007 2 COMP-652 Lecture 2 Recall: Three decisions • What should be the error function? • What should be the hypothesis class? • How are we going to Fnd the best hypothesis in the class (the one that minimizes the error function)? September 10, 2007 3 COMP-652 Lecture 2 Error functions for binary classiFcation • One worthy goal is to minimize the number of misclassified examples • Suppose Y = {- 1 , 1 } and the hypotheses h w ∈ H also output a +1 or- 1 • An example x , y is misclassiFed if yh w ( x ) is negative. • So a reasonable error function is just counting the number of examples correctly classiFed: J ( w ) =- X i ∈ MisclassiFed y i h w ( x i ) This is called 0-1 loss • This function is not differentiable, so often we will still use the mean-squared error. September 10, 2007 4 COMP-652 Lecture 2 Choosing the hypothesis class • For regression, we used linear hypotheses (simple, nice) • Is there an analogue for classi¡cation? • What about linear hypotheses? September 10, 2007 5 COMP-652 Lecture 2 Example: Wisconsin data 10 15 20 25 30 0.2 0.4 0.6 0.8 1 tumor size (mm?) non ! recurring (0) / recurring (1) What is the meaning of the output in this case? September 10, 2007 6 COMP-652 Lecture 2 Output of a classiFer • Useful predictions could be: – The predicted class – The probability that the example belongs to a given class • Just applying linear regression as is gives us neither September 10, 2007 7 COMP-652 Lecture 2 Perceptron w 1 w 2 w n w x 1 x 2 x n x =1 . . . ! ! w i x i n i =0 1 if > 0-1 otherwise { o = ! w i x i n i =0 • We can take a linear combination and threshold it: h w ( x ) = sgn ( w T x ) = 8 < : +1 if w T x >- 1 otherwise This is called a perceptron ....
View Full Document

This note was uploaded on 09/04/2008 for the course COMP 652 taught by Professor Preicup during the Fall '07 term at McGill.

Page1 / 14

ml-lecture02 - Lecture 2 Classifcation Perceptron Sigmoid...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online