class02

# class02 - The Learning Problem and Regularization 9.520...

This preview shows pages 1–11. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: The Learning Problem and Regularization 9.520 Class 02, 13 February 2006 Tomaso Poggio Plan Learning as function approximation Empirical Risk Minimization Generalization and Well-posedness Regularization Appendix: Sample and Approximation Error About This Class Theme We introduce the learning problem as the problem of function approximation from sparse data. We define the key ideas of loss functions, empirical error and gen- eralization error. We then introduce the Empirical Risk Minimization approach and the two key requirements on algorithms using it: well-posedness and consistency. We then describe a key algorithm Tikhonov regular- ization that satisfies these requirements. Math Required Familiarity with basic ideas in probability theory. Data Generated By A Probability Distribution We assume that X and Y are two sets of random variables. We are given a training set S consisting n samples drawn i.i.d. from the probability distribution ( z ) on Z = X Y : ( x 1 , y 1 ) , . . . , ( x n , y n ) that is z 1 , . . . , z n We will make frequent use of the conditional probability of y given x , written p ( y | x ): ( z ) = p ( x, y ) = p ( y | x ) p ( x ) It is crucial to note that we view p ( x, y ) as fixed but un- known . Probabilistic setting X Y P(x) P(y|x) Hypothesis Space The hypothesis space H is the space of functions that we allow our algorithm to provide. For many algorithms (such as optimization algorithms) it the space the algorithm is allowed to search. As we will see, it is often important to choose the hypothesis space as a function of the amount of data available. Learning As Function Approximation From Samples: Regression and Classification The basic goal of supervised learning is to use the training set S to learn a function f S that looks at a new x value x new and predicts the associated value of y : y pred = f S ( x new ) If y is a real-valued random variable, we have regression . If y takes values from an unordered finite set, we have pattern classification . In two-class pattern classification problems, we assign one class a y value of 1, and the other class a y value of 1. Loss Functions In order to measure goodness of our function, we need a loss function V . In general, we let V ( f, z ) = V ( f ( x ) , y ) denote the price we pay when we see x and guess that the associated y value is f ( x ) when it is actually y . Common Loss Functions For Regression For regression, the most common loss function is square loss or L2 loss: V ( f ( x ) , y ) = ( f ( x ) y ) 2 We could also use the absolute value, or L1 loss: V ( f ( x ) , y ) = f ( x ) y | | Vapniks more general-insensitive loss function is: V ( f ( x ) , y ) = ( | f ( x ) y | ) + Common Loss Functions For Classification For binary classification, the most intuitive loss is the 0-1 loss: V ( f ( x ) , y ) = ( yf ( x )) where ( yf ( x )) is the step function. For tractability and other rea- sons,...
View Full Document

## This note was uploaded on 11/11/2011 for the course BIO 9.07 taught by Professor Ruthrosenholtz during the Spring '04 term at MIT.

### Page1 / 42

class02 - The Learning Problem and Regularization 9.520...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online