This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CSE 6740 Lecture 15 What Error Guarantees Can We Make? (Learning Theory and Generalization) Alexander Gray agray@cc.gatech.edu Georgia Institute of Technology CSE 6740 Lecture 15 p. 1/2 8 Today 1. Statistical inequalities (How can we bound values that can appear in the future?) 2. Confidence intervals (How good is the estimation/learning?) CSE 6740 Lecture 15 p. 2/2 8 Statistical inequalities How can we bound values that can appear in the future? CSE 6740 Lecture 15 p. 3/2 8 Markovs Inequality Theorem ( Markovs inequality ): Suppose X is a nonnegative random variable and E ( X ) exists. Then for any t > , P ( X > t ) E ( X ) t . (1) CSE 6740 Lecture 15 p. 4/2 8 Markovs Inequality: Proof Since X > , E ( X ) = integraldisplay xf ( x ) dx (2) = integraldisplay t xf ( x ) dx + integraldisplay t xf ( x ) dx (3) integraldisplay t xf ( x ) dx (4) t integraldisplay t f ( x ) dx (5) = t P ( X > t ) . (6) CSE 6740 Lecture 15 p. 5/2 8 Chebyshevs Inequality Theorem ( Chebyshevs inequality ): If = E ( X ) and 2 = V ( X ) , then P (  X  t ) 2 t 2 (7) and P parenleftbiggvextendsingle vextendsingle vextendsingle vextendsingle X vextendsingle vextendsingle vextendsingle vextendsingle u parenrightbigg 1 u 2 (8) (or P (  Z  u ) 1 u 2 if Z = ( X ) / ). For example, P (  Z  > 2) 1 / 4 and P (  Z  > 3) 1 / 9 . CSE 6740 Lecture 15 p. 6/2 8 Chebyshevs Inequality: Proof Using Markovs inequality, P (  X  t ) = P (  X  2 t 2 ) (9) E ( X ) 2 t 2 (10) = 2 t 2 . (11) The second part follows by setting t = u . CSE 6740 Lecture 15 p. 7/2 8 Chebyshevs Inequality: Example Suppose we test a classifier on a set of N new examples. Let X i = 1 if the prediction is wrong and X i = 0 if it is right; then X N = 1 N N i =1 X i is the observed error rate. Each X i may be regarded as a Bernoulli with unknown mean p ; we would like to estimate this....
View
Full
Document
This note was uploaded on 04/03/2010 for the course CSE 6740 taught by Professor Staff during the Fall '08 term at Georgia Institute of Technology.
 Fall '08
 Staff

Click to edit the document details