COS 424SML 302 Probability and Statistics Review February 6

# Cos 424sml 302 probability and statistics review

COS 424/SML 302 Probability and Statistics Review February 6, 2019 37 / 69

Independent and identically distributed random variables Independent and identically distributed (IID) random variables are: 1 Independent 2 Identically distributed If we repeatedly flip the same coin n times and record the outcome, then X 1 , . . . , X n are IID. The IID assumption is useful in data analysis. COS 424/SML 302 Probability and Statistics Review February 6, 2019 38 / 69
Statistical terminology: data On the data side: Sample ( n ) is n IID draws of a specific random variable X Features ( p ) is the dimension of random variable X Examples Emails: n separate emails, p word counts from a dictionary of length p Netflix: n users, p movies Microarray data: n samples, p genes COS 424/SML 302 Probability and Statistics Review February 6, 2019 39 / 69

Statistical terminology: models On the statistical model side: Parameters are values that define (or index ) a distribution scale with the number of features O ( p ) Latent variables are features that cannot be directly observed scale with the number of samples O ( n ) Observed variables are features that are observed may be thought of as a n × p matrix Example: an email filter where features are dictionary words parameters are the frequency of each word for spam , not spam latent variables are assignments of unlabeled email to spam or not spam observed variables are the dictionary word counts for each sample COS 424/SML 302 Probability and Statistics Review February 6, 2019 40 / 69
What is a parameter? Parameters are values that index a distribution. Bernoulli parameters A coin flip is a Bernoulli distribution. The Bernoulli parameter ( bias ) is the probability of a H (refer to H as 1). p ( x | π ) = π [ x =1] (1 - π ) [ x =0] , where [ · ] is an indicator function , which is 1 when its argument is true and 0 otherwise. Changing π leads to different Bernoulli distributions. COS 424/SML 302 Probability and Statistics Review February 6, 2019 41 / 69

The likelihood function The data likelihood function is the probability of the observed data X given the model parameters θ : p ( x | θ ) = n Y i =1 p ( x i | θ ) Likelihood of a sequence of coin flips Suppose we flip a coin n times and record the outcomes. Further, suppose we think that the probability of heads is π . (We do not yet care about estimating the true π .) Given π , the likelihood, or probability of an observed sequence, is p ( x 1 , . . . , x n | π ) = n Y i =1 π [ x i =1] (1 - π ) [ x i =0] Why can I multiply likelihoods across the samples? COS 424/SML 302 Probability and Statistics Review February 6, 2019 42 / 69
The log likelihood Take the log of the likelihood function; this is the log likelihood function . ‘‘ ( π ; x ) = log p ( x | π ) = log n Y i =1 π [ x i =1] (1 - π ) [ x i =0] = n X i =1 [ x i = 1] log π + [ x i = 0] log(1 - π ) The log likelihood is the objective in an optimization problem: What is the value of the parameter that maximizes the log likelihood?

