new_554004802 - The Estimation of Density Functions p ( x |...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: The Estimation of Density Functions p ( x | wi ) p ( wi ) In pattern recognition applications, we rarely have the complete knowledge about the density functions How to estimate the density functions from samples or training data? 3 problems How to estimate the density functions and prior probabilities from data set? The properties of the estimation? ? How to estimate the error rate of a classifier from data set? Three Kinds of Problems Supervised parameter estimation Known: samples, the class label of the samples, the form of the density functions. Maximum-Likelihood Estimation Assumption: is a determinate unknown parameter. Maximum-Likelihood Estimation Likelihood Function p ( | ) = p ( x1 , x2 , N k =1 , xN | ) dl ( ) =0 d = p ( xk | ) = l ( ) H ( ) = ln l ( ) = ln p ( xk | ) = ln p ( xk | ) k =1 k =1 N N dH ( ) =0 d 1 1 < x < 2 - 2 1 p( x | ) = 0 1 ( 2 - 1 ) N l ( ) = 0 H ( ) = - N ln( 2 - 1 ) H 1 =N 1 2 - 1 H -1 =N 2 - 1 2 x = min{x1 , ' , xN } , xN } x " = max{x1 , ^1 = x ' ^2 = x " Maximum-Likelihood Estimation in Gaussian Case (Normal distribution) p ( x )N ( , ) 2 = [ , 2 ] - ( x- )2 2 2 p( x | ) = 1 2 e = {x1 , ^ ^ , x N } , 2 MLE in Gaussian Case H ( ) = ln p ( x k | ) = 0 k =1 N 1 ( xk - 1 ) 2 ln p ( x k | ) = 2 - 1 + ( x k - 1 ) 2 2 2 2 2 1 1 2 ln p ( x k | ) = - ln(2 2 ) - ( xk - 1 ) 2 2 2 N 1 ^ ( xk - 1 ) = 0 ^ k =1 2 N ^ 1 N ( xk - 1 ) 2 - + =0 2 ^ k =1 2 k =1 2 ^ == 1 1 ^ N x k =1 N k ^ =2 = 1 2 ^ N (x k =1 N k ^ - ) 2 ^ == 1 1 ^ N ^= 1 N N x k =1 N k ^ )( xk - )T ^ ( xk - k =1 (unbiased estimation) (Asymptotic unbiased estimation) (biased estimation) ^ E = N lim E^ = 550 500 : Identifiability Problem ' x p( x | ) p( x | ' ) For two different parameters and ' , if p ( x | ) p ( x | ' ) exit one x, we call it identifiable. Identifiability Problem (An Example for unidentifiability) x 01 x is binary variable 1 x 1 x 1- x 1- x p ( x | ) = 1 (1 - 1 ) + 2 (1 - 2 ) 2 2 1 2 (1 + 2 ) = 1 - 1 ( + ) 2 1 2 x =1 x=0 (Bayesian Estimation) p( ) p ( x | ) p ( ) = p ( | x) = p( x) p ( x | ) p ( )d p( x | ) p ( ) (bayesian learn) p ( x | X ) p( x | ) = p( x, / )d = p( x / ) p( / )d p( / ) = p( / ) p( ) p( / ) p( )d (bayesian learn) p( / ) = p( / ) p( ) p( / ) p( )d p( / ) = ^ p( / ) = ^ p( x | ) p( x | ^) ^ p( x | ) = p( x, / )d = p( x / ) p( / )d p( / ) p( x | ) , xN } p ( x) ? N = {x1 , p ( N | ) = p ( x N | ) p ( N -1 | ) p( / ) = p( / ) p( ) p( / ) p( )d p ( | N ) = p( p ( x N | ) p ( N -1 | ) p ( ) N -1 | ) p ( ) p ( x N | )d p ( | N ) = p ( x N | ) p ( | N -1 ) d p ( x N | ) p ( | N -1 ) p ( N -1 | ) p ( ) d (Bayes Learning) p ( | 0 ) = p ( ) p ( ), p ( | x1 ), p ( | x1 , x2 ) ( - 0 ) (Recursive Bayes Learning) (Bayes Learning) N lim p ( x | N ) = p ( x | ^ = ) = p ( x) This is right for many density functions. (Incremental Learning) Bayesian Estimation for Normal Dsitribution p ( x | )N ( , ) 2 p ( )N ( 0 , ) 2 0 p( | ) = p( | ) p( ) p( | ) p( )d N k =1 = p ( xk | ) p ( ) = k =1 N 1 e 2 1 xk - 2 ) - ( 2 1 2 0 e 1 -0 2 ) - ( 2 0 = e 1 - xk 2 - 0 2 - [ ( ) +( ) ] 2 k =1 0 ' N = e = e = 1 - xk 2 - 0 2 - [ ( ) +( ) ] 2 k =1 0 ' N 1 N 1 1 0 2 - [( 2 + 2 ) - 2 ( 2 xk + 2 ) ] k =1 0 " 2 0 N 1 2 N e 1 -N 2 - ( ) 2 N p( | ) N ( N , ) 2 N 2 N 0 2 N = 0 mN + 2 2 2 2 N 0 + N 0 + = 2 N + 2 N 2 0 2 0 2 1 mN = N x k =1 N k p ( x | X ) = p( x | ) p ( | X ) d = 1 1 x- exp - (2 ) 2 1 2 1 x - 2 1 N exp - (2 ) N 2 N 1 ( x - N )2 = exp - f ( , N ) 2 2 2 N 2 + N p ( x | ) = p ( x | ) p ( | ) d N ( N , + ) 2 2 N 2 N Basic method: (The form of the density functions are not known) N samples are drawn i.i.d x1 , x 2 , , xN p( x) Nonparametric Estimation P: probability of one sample is in R k of N are in R: k P ( k ) = C N P k (1 - P ) N - k N! C = k!( N - k )! k N E[k ] = NP Mode: the number whose frequency is the highest. Pm = max Pk ^ ^ k = m ( N + 1) P NP ^ k P N Is an average of the density in a small region P = p( x)dx = p( x)V k ^ ^ ^ P = p( x)dx = p( x)V N k ^ p( x) = NV R1 , R2 , 1 2 V1 ,V2 , k1 , k 2 , , RN N , VN , kN x R1 , , R N kN N ^ pN ( x) = will converge to p (x) VN 1) lim VN = 0 N if 2) lim k N = N kN 3) lim =0 N N Parzen Windows R N is a d-dimensional supercube d V N = hN 1, (u ) = 0, 1 if u j , j = 1, 2, , d 2 else The number of samples in VN which is centered at x x - xi kN = ( ) hN i =1 N 1 ^ p N ( x) = N x - xi 1 V ( h ) i =1 N N N 1) (u ) 0 2) (u )du = 1 1 ^ p N ( x) = N x - xi 1 V ( h ) i =1 N N N 1 ^ pN ( x)dx = N x - xi 1 V ( h )dx i =1 N N N 1 = N 1 = N 1 x - xi V ( h )dx i =1 N N N 1 (u )du = N N = 1 i =1 N (other window functions) The properties of the Estimation ^ p N ( x) 1 p ( x) x 2 (u ) 0 sup (u ) < u (u )du = 1 u lim (u ) ui = 0 i =1 d 3 lim VN = 0 N N lim NV N = (the width of the windows) (examples) (examples) k N Nearest Neighbor Estimation k N Nearest Neighbor kN ^ p N ( x) = N VN 1 VN = 0 lim N 2 k N = lim N kN 3 lim =0 N N (examples) Nearest Neighbor Estimation k N Parzen Windows VN 100 100^2 D 100^d p ( x, y ) = p ( x ) p ( y ) (overfitting) 10 (overfitting) 5 (overfitting) (overfitting) y(x,w) M=? M M Bayesian M=9 AICAkaike BICBayesian AIC = Estimation of Error Rate 1 2, 1. P(1 ), P( 2 ) are not known Draw N samplesk samples are classified to wrong class P (k ) = C (1 - ) k N k N -k : true error rate k ln P(k ) (ln C N + ln k + ln(1 - ) N - k ) = MLE k N -k k = - =0 ^ = 1- N E ( k ) = N Var (k ) = N (1 - ) k E[ k ] N ^ E ( ) = E[ ] = = = N N N ^ Var[ ] = Var[k ] N 2 (1 - ) = N (1) 2. P( 1 ), P( 2 ) are known 1 : N 1P( 1 ) 2 : N 2 P( 2 ) k1 N = N1 + N 2 P ( k1 , k 2 ) = P ( k1 ) P ( k 2 ) = samples in class 1 are classified to wrong class k 2 samples in class 2 are classified to wrong class 2 C i =1 ki ki Ni i (1 - i ) N i - ki 1 , 2 are 1 , 2 true error rate ki ^ i = Ni i = 1,2 2 ^ ^ ^ ^ ' = P ( 1 ) 1+ P ( 2 ) 2 = P ( i ) i i =1 ^ ' ] = P(1 ) E[1 ] + P( 2 ) E[ 2 ] ^ ^ E[ = P(1 )1 + P( 2 ) 2 = 1 ' ^ Var[ ] = N P( ) (1 - ) i =1 i i i 2 ( 2) 12 (1) - (2) = [ (1 - ) - P(1 )1 (1 - 1 ) - P( 2 ) 2 (1 - 2 )] N = [ P(1 ) P( 2 )(1 - 2 ) ] N 0 2 They are easy to understand 1ML Estimation 2unbias 3. There are N samples available for training and testing. The Error rate is related with training set and testing set How to separate the data set? 12......N leave one out K ^ = N K: 3 P65 P69 4 P85 P89 P101 ...
View Full Document

Ask a homework question - tutors are online