EverythingProb - Probability & Statistics Refresher...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Probability & Statistics Refresher Everything I need to know about Probability and Statistics in one easy lecture 1. Introduction to random variables • A discrete random variable can assume only a countable number of values. Example: Coin flip (heads/tails), Let’s make a deal (Door 1, 2 or 3), Die (1, 2, 3, 4, 5 or 6). • A real (or continuous) random variable can assume a range of values, x Example: most sensor readings. [Note: We’ll be dealing a lot with both types of random variables.] a b . B. Why should we care about random variables? • Every sensor reading is a random variable (e.g. thermal noise, etc.). • Many things in this world can be appropriately viewed as random events (e.g. your instructors’s choice of marker colors, start time of lecture). • There is some degree of uncertainty in almost everything we do. C. Some synonyms for “random” • stochastic • nondeterministic 2. Discrete random variables A. Notation • Let A B C • Let a b c A B C and A 1 A 2 A 3 denote discrete random variables. denote possible values for corresponding discrete random variables and A 1 A 2 A 3 , respectively. Hence, P A = a denotes the probability that the random variable A assumes the values a . • In the special case when the random variable A can assume only two values, (T)rue or (F)alse, denote P A as the probability that the random variable A is true, and denote the random variable A is false. B = P A B = P A B = P A and B . B = P A or B . A or A as the probability that • Let P A • Let P A B = P A • Let P A = a B = b denote the conditional probability that A = a given that B = b . Often we use the shorthand P A B when A and B are true/false random variables. B. Axioms of probability 0 P A 1 -1-     ¨ F B A@ and a 1 a 2 a 3 ¡     W 5 V e d U S T © 89 6 5 #"7 7 7 59 9 1 4& 3& 2 5 & 5 ' 0)( & 5 2 2 ( ( $% % % !#!"! g c a b R P Q i h ` X Y I G H t rsq t p ¤ ¥ f DEC  ¢ ¥  £ £ ¦ § ¦ A. What is a random variable? A random variable can be viewed as the name of an experiment with a probabilistic outcome. It’s value is the outcome of that experiment. Tom Mitchell, 1997 (1) Probability & Statistics Refresher P True = 1 = 0 P False P A B = PA +P B –P A B C. Conditional probability PA B P A B = ---------------------- , P B P B 0 From (5), P A B = P A B P B = P B A P A (Chain rule) Equation (6) leads to Bayes Theorem: P B A P A P A B = ------------------------------P B Assume that precisely one of A 1 A 2 n A n mutually exclusive events can occur. Then, P A1 A2 An = P Ai = 1 i=1 and the general form of Bayes Theorem is given by, where, n D. Other laws of probability • P A = 1–P Easy to prove. From (4), P A A = P A +P A –P A A P A A = 0 (something can’t be true and false at the same time). P A A = 1 (binary random variable has to be either true or false). A 1 = P A +P P A = 1–P A • P A = P A • P A B B +P A B C = P A B C P B C P C E. Independence A and B are independent if and only if P A = a = P A = a B = b , easily derive the following properties of independent events: P B = P B A a b . From this definition, we can -2-     ÛÜ & Ú Ù Œ Ø@ × ¯ ¬ ­ ® « © ª ¨E§ ÔÕÓ Ò Ñ Ð Î Ï Ê È É Ç Ä Á ¿À ¼ º» Ö ¢ ¡ à ß p Í Ë Ë Ì Å Æ ÃE ¾E½ ¹E¸ · ´µ ³ °± ¦ £¤ ˜ P B = PB A j P A j (10) j=1 A (11) (12) (13) (14) (15) (16)  P B Ai P Ai P A i B = ---------------------------------P B       ‰ Šˆ @  ‹ ‡  5 … „ … 5 …  5 ƒ‚ 5 ‚ 5 ‚ €  ~ }s| “‰@ z y { wsv u t rsq p o n l h“g j i p k “‰@  f d e @ ™ ‘ – ” • @ ’“‰@ — Eˆ‰@ ‡ … † @ p „ ƒ@ œ˜ ƒ› @ š ˜ ™ @   • ” “‰@ ’ “‘   ŽŒ @ Þ“Ý — – “‰@ ˜@ m x †@ ‚ ¶ ² ¥ y ŸEž @ 5 u v w € § x £ (2) (3) (4) (5) (6) (7) (8) (9) Probability & Statistics Refresher P A B = PA P B Example: coin toss F. Conditional independence A and B are conditionally independent given C if and only if P A=a B=b C=c = P A=a C=c , a b c Example: Traveling salesman B A b C a c Figure 1 3. Probability distributions A. Discrete random variables For a given discrete random variable X , the probability distribution for X assigns a probability for each possible value or outcome of X , P X=x = p x , x For n random variables, X 1 X 2 ble combinations of values, P X 1 = x1 X 2 = x2 X n , the joint probability distribution assigns a probability for all possixn , x1 x2 xn (20) X n = xn = p x1 x2 Example: If each random variable can assume one of k different values, then the joint probability distribution for n different random variables is fully specified by k n values (mention Bayes nets). Given two discrete random variables X Y , the univariate probability distribution p x is related to the joint probability distribution px y by, px = px y y B. Real random variables A real-valued random variable X is characterized by a continuous probability distribution function, F x = P X F x1 F – x ,– x F x 2 if x 1 x2 = 0, F = 1 P X=x = 0 Often, however, the distribution is expressed as a probability density function (pdf) p x , -3-      b a ¡#¨ ë ¨ $ " %#¨ !û                 ¦   ¨     ¨  û ¦¥ ¥ ¤ & 5 ú )Û ú & ù  ø ô ¨ û Û Œ ÷ @ öô p ` Y W U XV  p T R S  Q  IG P FD H E B C  C A  @ 8 9 7¡6 ( & '¨ § ¨  û¥ á £ ò ñ ð ï s‰@ E‰@ î ì í @  ¢  ¡ÿ¨ ý þ û 4 3 1 2¨ 5 (17) ó ¨  Œ £ ç ã 0 ) ¡#¨ õ@ ü@ ©@ £ å 5 £ å  æ £  è é ê â ä (18) (19) (21) (22) (23) (24) (25) Probability & Statistics Refresher dF x p x = -------------- , dx p x 0, x – p x dx = 1 From (26), F x = x – p t dt X n , the joint probability density function is given by, For n real-valued random variables, X 1 X 2 p x1 x2 xn 0, x1 x2 xn – – – p x1 x2 1 xn d x1 d x2 d xn = 1 likelihood of x and x 1 x 2 Note: p x and px tively. x2 x n are often referred to as the x n , respec- Given two continuous random v ariables X Y , the univariate probability distribution p x is related to the joint probability distribution px y by, – C. Normal (Gaussian) distribution A very important continuous probability distribution is the Normal, or Gaussian, distribution. It is defined by the following probability density function (pdf), 1 – x– 2 p x = -------------- exp ---------------------- = N x 2 2 2 2 for specific values of the scalar parameters 1 1 p x = --------------------------------- exp – -- x – 2 2 d 2 1/2 ˜ and . In multiple dimensions, = N x T –1 x– ˜ ˜ where and x are d -dimensional vectors, and is a d d positive-definite matrix. [Note: A square matrix M is positive definite if and only if, v T Mv 0 v 0 Equation (35) also implies that all the eigenvalues i 0 and that M is invertible.] Example: Below, the Normal distribution is plotted in one and two dimensions. 4. Expected value, variance, standard deviation A. Expected value for a discrete random variable The expected value for a discrete random variable is given by, E X = xp x x The mean x n = for a random variable X is given by, -4-      g  ½ w ¿ ¢ ¾ ¢ ¸ ¬ § · œ ¯ € ® ­« ª © •“ ¨ ¦š ¤ ¨ ¼ ¹º¹ š » f ¶ ´°š µ ³ ² °š ± ¥ £ ¢ ™ ” ’ ¡#¨ Ÿ› ‘   žŸ ¡ žœš f ˜ –  — € •“ Æ f pÅ Ä pÃÀ pÁ Î  p “ Ê È Ì %Ë  Í ºÉÇ Ž px = px y dy         ¦   ~ ƒ ‚ ¡#¨  d Œ ¢ ‹ ‰ Š ¨   ¡#¨ ˆ ‡ † „ …¨ € û { z{  y w }   ¦{ | ¨ q¡xs¨  ¢ ¦o ¢ o ¢ m   ¦ k l ¨ p q r i s t u v o n k jk   ¦ h  f  p e d   ¦ ˜ ™ ¨ h gh ˜ —˜  û ¦– û – – • £ ‰ ˆ ‘ ¢ #¨ ’ “ ”  ¢ ¡#¨  ‚ ƒ € y  x  p w ¡#¨ v u  ¢ ¡#¨ t ¡s ¢ r q h ‡ … ¡†„ À (26) (27) (28) (29) i ã å c (30) (31)  è æ y £ e (32) § (33) (34) (35) (36) Probability & Statistics Refresher 0.4 0.3 0.2 0.15 0.1 0.1 0.05 2 3 2 1 0 1 2 3 0 2 2 2 1 x n = -n xi i=1 where x i denotes the ith measurement of X . The mean and expected value for a random variable are related by, E X = lim x n n Example: Let A denote a random variable which denotes the outcome of a die roll. Find E A . Assume a fair die. then P A = i = p i = 1 6 , E A = 1 1 6 + 2 1 6 + 3 1 6 + 41 1 E A = -6 i. 6 + 51 6 + 61 6 (39) i = 3.5 i=1 [Note: The term “expected value” can be misleading, since the expected value of a random variable may not even be a possible value for the random variable itself.] Example: You shake two 6-sided dice. What is the expected value of the lower of the two numbers ( A )? P A = 6 = P D 1 = 6 P D 2 = 6 = 1 36 P A = 5 = P D1 = 5 P D2 = 5 + P D1 = 5 P D2 = 6 + P D1 = 6 P D2 = 5 P A = 5 = 3 36 Similarly, P A=i = 7–i + 6–i 36 = 13 – 2i 36 Therefore, i=1 B. Expected value for a continuous random variable The expected value for a continuous random variable is given by, E X = – xp x dx -5-  ÷  ¢ US  V W X QRP T ÷ E A = i 13 – 2i 91 ----------------------- = ----- = 2.52778 36 36  6  Define two random variables D 1 and D 2 , which define the outcome of the die roll of dice 1 and 2.     &ü % @$ ÿ #@ "ÿ !@ ü ÷ BAÏ  @ —ü @ ÿ ÷ 9 86Ï ÿ 5 — 4Ï 2 37 I GÏ  @ ü  @© ÿ ÷ Ï ¥ ÿ £¤¢ 6 (40) (41) (42) (43) (44) (45) (46) è  n   é ç Ò Figure 2 Ý = 0 = 1 0 p Þßp à = 0 0 ˜ p ¡ÿ ý þ ü ú û Ïð ï ÑÓ ø ù ÷ õ ö íì î#¨ ë ê á   å æ  äºã  wæ Ï  â £   )÷ (ü @ ü ¨@ §ÿ FÏ H ó ô ÛÜ œp ÖÙ 1 òéñ E C ¤DÇ Ú ÕØ ¦@ @ '@ 0@ i ã Ô× 0 0 = 10 01 (37) (38) € Ð ¦ Probability & Statistics Refresher Example: E X for the Normal distribution is C. Variance and standard deviation The variance V X of a random variable X is defined by, V X = E X– 2 where = E X . The standard deviation V X of a random variable X is defined by, = Because of (48), we often denote V X = Example: The variance of the Normal distribution is D. Covariance of two random variables The covariance of two random variables X Y is defined by, Cov X Y = E X – x x Y– y where = E X and Cov X Y = 0 i j i = E Xi – i Xj – j where tion. is the ith element of the vector, and X i is the ith random variable in the joint Normal distribu- Consider, for example, the two plots below. 3 3 2 2 1 1 1 2 3 2 1 0 1 2 3 3 2 E. Properties of the expected value operator • The expected value of a function g X is given by, “ E g X = g x px (discrete random variable) x -6- – ÷ Figure 3 “ = 10 01 4 = -- 1 1 2 3 1 2 1 • ”  Œ š ˜ ›™— ¥ £ ¢ ¡ U¤¨ U™— ¦ žœ Ÿ p ‡ ’ † 3 ‹Ž   p ‘ 0 = 0 0 ˜ 0 1 2 3 1 0 1 2 3  Example: The matrix in the multivariate Normal distribution is the covariance matrix. The i j element i j of the covariance matrix is given by,  y = E Y . If X and Y are independent, then,  o mlÏ n ô Ï „ 2. 2.   û p b ƒ ‚R … w û  „ ƒ w Ï v } ‚ € ˜ ~˜ û  | z {  Ç xyw q u € str p k p j h i fge 4 d ™R˜ “ — w –” 4 ’ “‘ “ ‰ • Ç ˆ † ‡ €Ry x v t guÇ s w r p h i q fge dRc Y ã a ` RDÇ ô …ˆ Š ‰ £ ¤ § . (47) (48) (49) (50) (51) (52) Probability & Statistics Refresher E g X • Linearity: Let a b be constants and let X Y be random variables. Then, E af X + bg Y = aE f X + bE g Y F. Central limit theorem Consider a set of independent, identically distributed random variables X 1 X 2 X n governed by an arbitrary probability distribution with mean and finite variance 2 . Also, define the sample mean, Xn Then, as n mean and standard deviation -----n This theorem often justifies the use of the Normal distribution in real life for large n . -7-  , the distribution governing X n approaches a Normal distribution with,  £  £ Î Ë Ì£ £ ʁ 1 -n Xi i=1  n  &  û ÆÅ û Å û Å Ä È Ã ¹À Â Û ¿ ½ž¼ ¾ & » ¹· Û ¶›µ º Á ¸ ´ ² ³& °  ¢ U¤¨ U™— ¯ ± ª ¨ž§ © ® ­ ¬ « = – g x px dx (continuous random variable) (53) (54) Ç É Í ¥  ¤ ‘ â (55) (56) (57) ...
View Full Document

This note was uploaded on 12/08/2010 for the course COMPUTER E 1 taught by Professor Khalil during the Spring '10 term at Bronx High School of Science.

Ask a homework question - tutors are online