COS 424SML 302 Probability and Statistics Review February 6

# Cos 424sml 302 probability and statistics review

• Notes
• 70

This preview shows page 38 - 45 out of 70 pages.

COS 424/SML 302 Probability and Statistics Review February 6, 2019 37 / 69

Subscribe to view the full document.

Independent and identically distributed random variables Independent and identically distributed (IID) random variables are: 1 Independent 2 Identically distributed If we repeatedly flip the same coin n times and record the outcome, then X 1 , . . . , X n are IID. The IID assumption is useful in data analysis. COS 424/SML 302 Probability and Statistics Review February 6, 2019 38 / 69
Statistical terminology: data On the data side: Sample ( n ) is n IID draws of a specific random variable X Features ( p ) is the dimension of random variable X Examples Emails: n separate emails, p word counts from a dictionary of length p Netflix: n users, p movies Microarray data: n samples, p genes COS 424/SML 302 Probability and Statistics Review February 6, 2019 39 / 69

Subscribe to view the full document.

Statistical terminology: models On the statistical model side: Parameters are values that define (or index ) a distribution scale with the number of features O ( p ) Latent variables are features that cannot be directly observed scale with the number of samples O ( n ) Observed variables are features that are observed may be thought of as a n × p matrix Example: an email filter where features are dictionary words parameters are the frequency of each word for spam , not spam latent variables are assignments of unlabeled email to spam or not spam observed variables are the dictionary word counts for each sample COS 424/SML 302 Probability and Statistics Review February 6, 2019 40 / 69
What is a parameter? Parameters are values that index a distribution. Bernoulli parameters A coin flip is a Bernoulli distribution. The Bernoulli parameter ( bias ) is the probability of a H (refer to H as 1). p ( x | π ) = π [ x =1] (1 - π ) [ x =0] , where [ · ] is an indicator function , which is 1 when its argument is true and 0 otherwise. Changing π leads to different Bernoulli distributions. COS 424/SML 302 Probability and Statistics Review February 6, 2019 41 / 69

Subscribe to view the full document.

The likelihood function The data likelihood function is the probability of the observed data X given the model parameters θ : p ( x | θ ) = n Y i =1 p ( x i | θ ) Likelihood of a sequence of coin flips Suppose we flip a coin n times and record the outcomes. Further, suppose we think that the probability of heads is π . (We do not yet care about estimating the true π .) Given π , the likelihood, or probability of an observed sequence, is p ( x 1 , . . . , x n | π ) = n Y i =1 π [ x i =1] (1 - π ) [ x i =0] Why can I multiply likelihoods across the samples? COS 424/SML 302 Probability and Statistics Review February 6, 2019 42 / 69
The log likelihood Take the log of the likelihood function; this is the log likelihood function . ‘‘ ( π ; x ) = log p ( x | π ) = log n Y i =1 π [ x i =1] (1 - π ) [ x i =0] = n X i =1 [ x i = 1] log π + [ x i = 0] log(1 - π ) The log likelihood is the objective in an optimization problem: What is the value of the parameter that maximizes the log likelihood?

Subscribe to view the full document.

• Spring '09
• Probability theory

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask 0 bonus questions You can ask 0 questions (0 expire soon) You can ask 0 questions (will expire )
Answers in as fast as 15 minutes