This preview shows pages 1–14. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Parameter Estimation Parameter Estimation PR , ANN, & ML 2 Notational Convention x Probabilities b Mass (discrete) function: capital letters b Density (continuous) function: small letters x Vector vs. scalar b Scalar: plain b Vector: bold b 2D: small b Higher dimension: capital x Notes in a continuous state of fluctuation until a topic is finished (many updates) PR , ANN, & ML 3 Parameter Estimation x Optimal classifier maximizes b a prior probability b classconditional density x Assumption b no correlation b time independent statistics ) ( ) ( )  ( )  ( x x x p P p p i i i ϖ = PR , ANN, & ML 4 Popular Approaches x Parametric : assume a certain parametric form for p( x w i ) and estimate the parameters x Nonparametric : does not assume a parametric form for p( x w i ) and estimate the density profile directly x Boundary : estimate the separation hyperplane (hypersurface) between p( x w i ) and p( x w j ) PR , ANN, & ML 5 a prior probability x Given the numbers of occurrence: b if number of samples are large enough b the selection process is not biased b Caveat : sampling may be biased k i M n P M n n n n i i k i i k k , , 1 ) ( ) , ( , ), , ( ), , ( 1 2 2 1 1 L L = = = ∑ = ϖ PR , ANN, & ML 6 Class conditional density x More complicated (not a single number, but a distribution ) b assume a certain form b estimate the parameters x What form should we assume? b Many, but in this course b We use almost exclusively Gaussian PR , ANN, & ML 7 x Gaussian (or Normal) Scalar case x Vector case x Unknowns b class mean and variance 2 2 ) ( 2 1 2 1 ) , ( )  ( i i u x i i i i e N x p σ π μ ϖ= = )] ( ) [( 2 1 2 / 1 1   2 1 ) ( )  ( i T i e N p i d i i i u x Σ u x Σ Σ , μ x r r r r= = Gaussian Distribution PR , ANN, & ML 8 feature 2 2 ) ( 2 1 2 1 σ π u x eμ population 2 PR , ANN, & ML 9 Why Gaussian (Normal)? x Central limit theorem predicts normal distribution from IID experiments x In reality b There are only two numbers in the scalar case (mean and variance) to estimate, (or d + d(d+1)/2 in ddimensions) b Nice mathematical properties (e.g., Fourier transform of a Gaussian is a Gaussian. Products and summation of Gaussian remain Gaussian, Any linear transform of a Gaussian is a Gaussian) PR , ANN, & ML 10 Projection Transformation x In particular, a whitening transform can diagonalize the covariance matrix PR , ANN, & ML 11 Parameter Estimation x Maximum likelihood estimator b Parameters have fixed but unknown values x Bayesian estimator b parameters as random variables with know a prior distributions b Bayesian estimator allows us to change the a priori distribution by incorporating measurements to sharpen the profile PR , ANN, & ML 12 Graphically x MLE x Bayesian parameters likelihood θ PR , ANN, & ML 13 Maximum Likelihood Estimator x Given b n labeled samples (observations) b an assumed distribution of e parameters b samples are drawn independently from x Find b parameter that best explains the observations } , , , { 2 1 n...
View
Full
Document
This note was uploaded on 08/06/2008 for the course CS 290I taught by Professor Wang during the Spring '07 term at UCSB.
 Spring '07
 WANG
 Machine Learning

Click to edit the document details