This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Probability & Statistics Refresher Everything I need to know about Probability and Statistics in one easy lecture
1. Introduction to random variables • A discrete random variable can assume only a countable number of values. Example: Coin ﬂip (heads/tails), Let’s make a deal (Door 1, 2 or 3), Die (1, 2, 3, 4, 5 or 6). • A real (or continuous) random variable can assume a range of values, x Example: most sensor readings. [Note: We’ll be dealing a lot with both types of random variables.] a b . B. Why should we care about random variables? • Every sensor reading is a random variable (e.g. thermal noise, etc.). • Many things in this world can be appropriately viewed as random events (e.g. your instructors’s choice of marker colors, start time of lecture). • There is some degree of uncertainty in almost everything we do. C. Some synonyms for “random” • stochastic • nondeterministic 2. Discrete random variables A. Notation • Let A B C • Let a b c A B C and A 1 A 2 A 3 denote discrete random variables. denote possible values for corresponding discrete random variables and A 1 A 2 A 3 , respectively. Hence, P A = a denotes the probability that the random variable A assumes the values a . • In the special case when the random variable A can assume only two values, (T)rue or (F)alse, denote P A as the probability that the random variable A is true, and denote the random variable A is false. B = P A B = P A B = P A and B . B = P A or B . A or A as the probability that • Let P A • Let P A B = P A • Let P A = a B = b denote the conditional probability that A = a given that B = b . Often we use the shorthand P A B when A and B are true/false random variables. B. Axioms of probability 0 P A 1 1 ¨ F B A@ and a 1 a 2 a 3 ¡ W 5 V e d U S T © 89 6 5 #"7 7 7 59 9 1 4& 3& 2 5 & 5 ' 0)( & 5 2 2 ( ( $% % % !#!"! g c a b R P Q i h ` X Y I G H t rsq t p ¤ ¥ f DEC ¢ ¥ £ £ ¦ § ¦ A. What is a random variable?
A random variable can be viewed as the name of an experiment with a probabilistic outcome. It’s value is the outcome of that experiment. Tom Mitchell, 1997 (1) Probability & Statistics Refresher P True = 1 = 0 P False P A B = PA +P B –P A B C. Conditional probability PA B P A B =  , P B P B 0 From (5), P A B = P A B P B = P B A P A (Chain rule) Equation (6) leads to Bayes Theorem: P B A P A P A B = P B Assume that precisely one of A 1 A 2
n A n mutually exclusive events can occur. Then, P A1 A2 An = P Ai = 1 i=1 and the general form of Bayes Theorem is given by, where,
n D. Other laws of probability • P A = 1–P Easy to prove. From (4), P A A = P A +P A –P A A P A A = 0 (something can’t be true and false at the same time). P A A = 1 (binary random variable has to be either true or false). A 1 = P A +P P A = 1–P A • P A = P A • P A B B +P A B C = P A B C P B C P C E. Independence A and B are independent if and only if P A = a = P A = a B = b , easily derive the following properties of independent events: P B = P B A a b . From this deﬁnition, we can 2 ÛÜ & Ú Ù Ø@ × ¯ ¬ ® « © ª ¨E§ ÔÕÓ Ò Ñ Ð Î Ï Ê È É Ç Ä Á ¿À ¼ º» Ö ¢ ¡ à ß p Í Ë Ë Ì Å Æ ÃEÂ ¾E½ ¹E¸ · ´µ ³ °± ¦ £¤ P B = PB A j P A j (10) j=1 A (11) (12) (13) (14) (15) (16) P B Ai P Ai P A i B = P B @ 5
5
5 5 5 ~ }s @ z y { wsv u t rsq p o n l hg j i p k @ f d e @ @ @ E@
@ p @ @ @ @ @ ÞÝ @ @ m x @ ¶ ² ¥ y E @ 5 u v w § x £ (2) (3) (4) (5) (6) (7) (8) (9) Probability & Statistics Refresher P A B = PA P B Example: coin toss F. Conditional independence A and B are conditionally independent given C if and only if P A=a B=b C=c = P A=a C=c , a b c Example: Traveling salesman B A b C a c Figure 1 3. Probability distributions A. Discrete random variables For a given discrete random variable X , the probability distribution for X assigns a probability for each possible value or outcome of X , P X=x = p x , x For n random variables, X 1 X 2 ble combinations of values, P X 1 = x1 X 2 = x2 X n , the joint probability distribution assigns a probability for all possixn , x1 x2 xn (20) X n = xn = p x1 x2 Example: If each random variable can assume one of k different values, then the joint probability distribution for n different random variables is fully speciﬁed by k n values (mention Bayes nets). Given two discrete random variables X Y , the univariate probability distribution p x is related to the joint probability distribution px y by, px = px y
y B. Real random variables A realvalued random variable X is characterized by a continuous probability distribution function, F x = P X F x1 F – x ,– x F x 2 if x 1 x2 = 0, F = 1 P X=x = 0 Often, however, the distribution is expressed as a probability density function (pdf) p x , 3
b a ¡#¨ ë ¨ $ " %#¨ !û ¦ ¨ ¨ û ¦¥ ¥ ¤ & 5 ú )Û ú & ù ø ô ¨ û Û ÷ @ öô p ` Y W U XV p T R S Q IG P FD H E B C C A @ 8 9 7¡6 ( & '¨ § ¨ û¥ á £ ò ñ ð ï s@ E@ î ì í @ ¢ ¡ÿ¨ ý þ û
4 3 1 2¨ 5 (17) ó ¨ £ ç ã 0 ) ¡#¨ õ@ ü@
©@ £ å 5 £ å æ £ è é ê â ä (18) (19) (21) (22) (23) (24) (25) Probability & Statistics Refresher dF x p x =  , dx p x 0, x – p x dx = 1 From (26), F x = x – p t dt X n , the joint probability density function is given by, For n realvalued random variables, X 1 X 2 p x1 x2 xn 0, x1 x2 xn – – – p x1 x2
1 xn d x1 d x2 d xn = 1 likelihood of x and x 1 x 2 Note: p x and px tively. x2 x n are often referred to as the x n , respec Given two continuous random v ariables X Y , the univariate probability distribution p x is related to the joint probability distribution px y by,
– C. Normal (Gaussian) distribution A very important continuous probability distribution is the Normal, or Gaussian, distribution. It is deﬁned by the following probability density function (pdf), 1 – x– 2 p x =  exp  = N x 2 2 2
2 for speciﬁc values of the scalar parameters 1 1 p x =  exp –  x – 2 2 d 2 1/2 ˜ and . In multiple dimensions, = N x T –1 x– ˜ ˜ where and x are d dimensional vectors, and is a d d positivedeﬁnite matrix. [Note: A square matrix M is positive deﬁnite if and only if, v T Mv 0 v 0 Equation (35) also implies that all the eigenvalues i 0 and that M is invertible.] Example: Below, the Normal distribution is plotted in one and two dimensions. 4. Expected value, variance, standard deviation A. Expected value for a discrete random variable The expected value for a discrete random variable is given by, E X = xp x
x The mean x n = for a random variable X is given by, 4 g ½ w ¿ ¢ ¾ ¢ ¸ ¬ § · ¯ ® « ª © ¨ ¦ ¤ ¨ ¼ ¹º¹ » f ¶ ´° µ ³ ² ° ± ¥ £ ¢ ¡#¨ ¡ f Æ f pÅ Ä pÃÀÂ pÁ Î p Ê È Ì %Ë Í ºÉÇ px = px y dy ¦ ~
¡#¨ d ¢ ¨ ¡#¨
¨ û { z{ y w } ¦{  ¨ q¡xs¨ ¢ ¦o ¢ o ¢ m ¦ k l ¨ p q r i s t u v o n k jk ¦ h f p e d ¦ ¨ h gh û ¦ û £ ¢ #¨ ¢ ¡#¨ y x p w ¡#¨ v u ¢ ¡#¨ t ¡s ¢ r q
h
¡ À (26) (27) (28) (29) i ã å
c (30) (31) è æ y £
e (32) § (33) (34) (35) (36) Probability & Statistics Refresher 0.4 0.3 0.2 0.15 0.1 0.1 0.05 2 3 2 1 0 1 2 3 0 2 2 2 1 x n = n xi i=1 where x i denotes the ith measurement of X . The mean and expected value for a random variable are related by, E X = lim x n
n Example: Let A denote a random variable which denotes the outcome of a die roll. Find E A . Assume a fair die. then P A = i = p i = 1 6 , E A = 1 1 6 + 2 1 6 + 3 1 6 + 41 1 E A = 6 i. 6 + 51 6 + 61 6 (39) i = 3.5 i=1 [Note: The term “expected value” can be misleading, since the expected value of a random variable may not even be a possible value for the random variable itself.] Example: You shake two 6sided dice. What is the expected value of the lower of the two numbers ( A )? P A = 6 = P D 1 = 6 P D 2 = 6 = 1 36 P A = 5 = P D1 = 5 P D2 = 5 + P D1 = 5 P D2 = 6 + P D1 = 6 P D2 = 5 P A = 5 = 3 36 Similarly, P A=i = 7–i + 6–i 36 = 13 – 2i 36 Therefore, i=1 B. Expected value for a continuous random variable The expected value for a continuous random variable is given by, E X =
– xp x dx 5 ÷ ¢ US V W X QRP T ÷ E A = i 13 – 2i 91  =  = 2.52778 36 36 6 Deﬁne two random variables D 1 and D 2 , which deﬁne the outcome of the die roll of dice 1 and 2. &ü % @$ ÿ #@ "ÿ !@ ü ÷ BAÏ @ ü @ ÿ ÷ 9 86Ï ÿ 5 4Ï 2 37 I GÏ @ ü @© ÿ ÷ Ï ¥ ÿ £¤¢ 6 (40) (41) (42) (43) (44) (45) (46) è n é ç Ò Figure 2 Ý = 0 = 1 0 p Þßp à = 0 0 ˜ p ¡ÿ ý þ ü ú û Ïð ï ÑÓ ø ù ÷ õ ö íì î#¨ ë ê á å æ äºã wæ Ï â £
)÷ (ü @ ü ¨@ §ÿ FÏ H
ó ô ÛÜ p ÖÙ 1 òéñ E C ¤DÇ Ú ÕØ ¦@ @ '@ 0@ i ã Ô× 0 0 = 10 01 (37) (38)
Ð ¦ Probability & Statistics Refresher Example: E X for the Normal distribution is C. Variance and standard deviation The variance V X of a random variable X is deﬁned by, V X = E X–
2 where = E X . The standard deviation V X of a random variable X is deﬁned by, = Because of (48), we often denote V X = Example: The variance of the Normal distribution is D. Covariance of two random variables The covariance of two random variables X Y is deﬁned by, Cov X Y = E X –
x x Y– y where = E X and Cov X Y = 0 i j i = E Xi – i Xj – j where tion. is the ith element of the vector, and X i is the ith random variable in the joint Normal distribu Consider, for example, the two plots below.
3 3 2 2 1 1 1 2 3 2 1 0 1 2 3 3 2 E. Properties of the expected value operator • The expected value of a function g X is given by, E g X = g x px (discrete random variable) x 6 ÷ Figure 3 = 10 01 4 =  1 1 2 3 1 2 1 ¥ £ ¢ ¡ U¤¨ U ¦ p 3 p 0 = 0 0 ˜ 0 1 2 3 1 0 1 2 3 Example: The matrix in the multivariate Normal distribution is the covariance matrix. The i j element i j of the covariance matrix is given by, y = E Y . If X and Y are independent, then, o mlÏ n ô Ï 2. 2. û p b R
w û w Ï v } ~ û  z { Ç xyw q u str p k p j h i fge 4 d R w 4 Ç Ry x v t guÇ s w r p h i q fge dRc Y ã a ` RDÇ ô
£ ¤ § . (47) (48) (49) (50) (51) (52) Probability & Statistics Refresher E g X • Linearity: Let a b be constants and let X Y be random variables. Then, E af X + bg Y = aE f X + bE g Y F. Central limit theorem Consider a set of independent, identically distributed random variables X 1 X 2 X n governed by an arbitrary probability distribution with mean and ﬁnite variance 2 . Also, deﬁne the sample mean, Xn Then, as n mean and standard deviation n This theorem often justiﬁes the use of the Normal distribution in real life for large n . 7 , the distribution governing X n approaches a Normal distribution with, £ £ Î Ë Ì£ £ Ê 1 n Xi i=1 n & û ÆÅ û Å û Å Ä
È Ã ¹À Â Û ¿ ½¼ ¾ & » ¹· Û ¶µ º Á ¸ ´ ² ³& ° ¢ U¤¨ U ¯ ± ª ¨§ © ® ¬ «
=
– g x px dx (continuous random variable) (53) (54) Ç É Í ¥ ¤
â (55) (56) (57) ...
View
Full
Document
This note was uploaded on 12/08/2010 for the course COMPUTER E 1 taught by Professor Khalil during the Spring '10 term at Bronx High School of Science.
 Spring '10
 khalil

Click to edit the document details