p xd y class and yet one of the most

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e not conditionally independent, i.e. P (X|Y ) 6= P (x1 |Y )P (x2 |Y ), . . . , P (xd |Y ) Class + + + + ° ° ° ° And yet, one of the most widely used classifiers. Easy to train! It often performs well even when the assumption is violated. (left) A small e-mail data set described by count vectors. (right) The same data set described by bit vectors. Peter Flach (University of Bristol) What are the parameters of the model? Machine Learning: Making Sense of Data August 25, 2012 P(+) = 0.5, P(-) = 0.5 277 / 349 P(a|+) = 0.5, P(a|-) = 0.75 P(b|+) = 0.75, P(b|-)= 0.25 Domingos, P., & Pazzani, M. (1997). Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine |{i : yi = y }| ˆ P (y ) = n Learning. 29, 103-130. ˆ P ( xi , y ) |{i : Xij = xi , yi = y }|/n ˆ P ( xi | y ) = = ˆ |{i : yi = y }|/n P (y ) P(c|+) = 0.25, P(c|-)= 0.25 21 When there are few training examples 22 When there are few training examples What if you never see a training example where x1=a when y=spam? What if you never see a training example where x1=a when y=spam? P(x | spam) = P(a | spam) P(b | spam) P(c | spam) = 0 P(x | spam) = P(a | spam) P(b | spam) P(c | spam) = 0 What to do? What to do? Add “virtual” examples for which x1=a when y=spam. 23 24 6 10/29/13 Naïve Bayes for continuous variables Continuous Probability Distributions The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be the area under the graph of the probability density function between x1 and x2. Need to talk about continuous distributions! f (x) Uniform x1 x2 f (x) x Normal/Gaussian x1 x2 x 25 Expectations Discrete variables The Gaussian (normal) distribution Continuous variables Conditional expectation (discrete) Approximate expectation (discrete and continuous) 7 10/29/13 Properties of the Gaussian distribution A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to have a standard normal probability distribution. 99.72% 95.44% 68.26% µ – 3σ µ – 1σ µ µ + 3σ µ + 1σ µ – 2σ x µ + 2σ Standard Normal Probability Distribution   Standard Normal Distribution Gaussian Parameter Estimation Converting to the Standard Normal Distribution z= x−µ σ Likelihood function We can think of z as a measure of the number of standard deviations x is from µ. 8 10/29/13 Maximum (Log) Likelihood Example 34 Gaussian models Gaussian Naïve Bayes Assume we have data that belongs to three classes, and assume a likelihood that follows a Gaussian distribution Likelihood function: P ( X i = x | Y = yk ) = p 1 2⇡ exp ik ✓ µik )2 (x 2 2 ik ◆ Need to estimate mean and variance for each feature in each class. 35 36 9 10/29/13 Summary Naïve Bayes classifier: ²་  What’s the assumption ²་  Why we make it ²་  How we learn it Naïve Bayes for discrete data Gaussian naïve Bayes 37 10...
View Full Document

Ask a homework question - tutors are online