{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# Lecture 6 -

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MGCR 271 Business Sta+s+cs  Make Inference  Ramnath Vaidyanathan  Two Way Tables Chapter 9 Response Variable Explanatory Variable 1 sample None 2 samples Independent >2 samples Pooled 2-prop. z-test, Pooled Two Way Tables Chi-Square Test 7 Unpooled 2-prop. z-test, Unpooled 6 1-prop. z-test 5 Categ. Categ. 8 11  Quant. Logistic Regression Motivation Amy is a wine and music enthusiast. She felt that the wine people purchased by a customer would depend on the music that he/she was being exposed to at the time of the purchase. She wants to test her claim and hires you. What advice would you provide her? What kind of data should Amy collect? How would you go about testing her claim? Is the type of wine purchased in a  supermarket driven by the music played?  Raw Data Two Way Table Subject 1 2 3 ... 147 Music French None French ... None Wine Other French French ... Other Categorical Var. 1  Wine Wine French French French French Other Other 39 39 35 35 Music Music None Total None Total 43 30 30 69 78 147 Total Total 74 74 73 43 73 69 78 147 Categorical Var. 2  Wine French Total Wine  French  How can you test her Claim?  Other p ˆ 74 p1 ˆ 30  French French None 39Total 30 None 69Oth 147 Fre 43 78 39 Other 30 35 69 Wine Music 73 Wine French None 35 Total 43 74 78French 147 3 ... ... .. 73 p ˆ Music  French 39 147Other Other Total 35 74 Total 30 43 73 3 7 p1 ˆ         = 0.52  39  (39/74)  p2 ˆ p2 ˆ         = 0.41  69  (30/73)  p ˆ p1     = 47%  ˆ p2 ˆ (69/147)  p ˆ H 0 : p1 − p 2 = 0 p1 ˆ p2 ˆ Other  35  43  78  Ha : p1 − p2 = 0 French  None  Total  74  73  147  7 Two-Proportion pz-Test, Pooled p −ˆ ˆ 1 2 z= q p1 (1−p1 ) ˆ ˆ p (1−p ) ˆ ˆ + 2n 2 n1 2 (p1 −p2 )−d0 ˆˆ Fail to Reject H0 0.3 population H0 : µ1 − µ2 = d0 0.2 Ha : µ1 − µ2 = d0 p 2 0.1 x1 − x2 ¯ ¯ α 2 !3 0.0 !! !! ! !!! !! !!! !! !! !!! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! !! ! !! !! ! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! ! !! ! !! !! !! !! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!!! !!!! !!!! !!! !!! !!! !!! !!!!! !! !!! !!!!!!! !!! !!!!!!! !!! !!!!!!! !!!!!! !! !!!!!!! ! !!!!!!! ! !!!!!!!! !!!!!!!! N (0, 1) Fail to RejectFail to Reject H0 H0 ( ˆ1 − H0 µ1(1− ˆˆ2 = −0 µ p d dˆ z = :q pp− pp2 )−(10 p) ˆ ) ˆ H 0 : p1 − p 2 = d 0 Ha : p1 − p2 = d 0 α 2 Ha :nµp1−nµp2= d0 1 1 ˆ + 22 p = n1 +n2 ˆ ˆ x1 − x2 ¯ ¯ ! ! group group a b b c c ! n1 + n2 z∗ p 2 −z !2 ∗ !1 0 1 x H 0 : p1 − p 2 = d 0 H :p −p =d ( ˆ −p 0 ˆ ∗zz = q (p1 −pˆ2 )−d0 =3 q p1 (1p1p1ˆ2 )−2d(1−p2 ) z ˆ ) ˆ p(1−p)+ pp(1−ˆ ) ˆ −ˆ ˆ ˆ p ˆ n1 + n2 n1 n2 ∗ 2 −z p1 − pn1 p1 +n2 p2 ˆ ˆ2 ˆ p = n1 +n2 ˆ ˆ area under the standard normal curve to the left of z. Moore-212007 pbs November 20, 2007 13:52 z TABLE A Standard normal probabilities z −3.4 −3.3 −3.2 −3.1 −3.0 −2.9 −2.8 −2.7 −2.6 −2.5 −2.4 −2.3 −2.2 −2.1 −2.0 −1.9 −1.8 −1.7 −1.6 −1.5 −1.4 −1.3 −1.2 −1.1 −1.0 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 −0.0 .00 .0003 .0005 .0007 .0010 .0013 .0019 .0026 .0035 .0047 .0062 .0082 .0107 .0139 .0179 .0228 .0287 .0359 .0446 .0548 .0668 .0808 .0968 .1151 .1357 .1587 .1841 .2119 .2420 .2743 .3085 .3446 .3821 .4207 .4602 .5000 .01 .0003 .0005 .0007 .0009 .0013 .0018 .0025 .0034 .0045 .0060 .0080 .0104 .0136 .0174 .0222 .0281 .0351 .0436 .0537 .0655 .0793 .0951 .1131 .1335 .1562 .1814 .2090 .2389 .2709 .3050 .3409 .3783 .4168 .4562 .4960 .02 .0003 .0005 .0006 .0009 .0013 .0018 .0024 .0033 .0044 .0059 .0078 .0102 .0132 .0170 .0217 .0274 .0344 .0427 .0526 .0643 .0778 .0934 .1112 .1314 .1539 .1788 .2061 .2358 .2676 .3015 .3372 .3745 .4129 .4522 .4920 .03 .0003 .0004 .0006 .0009 .0012 .0017 .0023 .0032 .0043 .0057 .0075 .0099 .0129 .0166 .0212 .0268 .0336 .0418 .0516 .0630 .0764 .0918 .1093 .1292 .1515 .1762 .2033 .2327 .2643 .2981 .3336 .3707 .4090 .4483 .4880 T-2 TABLES ............................................................................................................................... ........................................ .04 .0003 .0004 .0006 .0008 .0012 .0016 .0023 .0031 .0041 .0055 .0073 .0096 .0125 .0162 .0207 .0262 .0329 .0409 .0505 .0618 .0749 .0901 .1075 .1271 .1492 .1736 .2005 .2296 .2611 .2946 .3300 .3669 .4052 .4443 .4840 .05 .0003 .0004 .0006 .0008 .0011 .0016 .0022 .0030 .0040 .0054 .0071 .0094 .0122 .0158 .0202 .0256 .0322 .0401 .0495 .0606 .0735 .0885 .1056 .1251 .1469 .1711 .1977 .2266 .2578 .2912 .3264 .3632 .4013 .4404 .4801 .06 .0003 .0004 .0006 .0008 .0011 .0015 .0021 .0029 .0039 .0052 .0069 .0091 .0119 .0154 .0197 .0250 .0314 .0392 .0485 .0594 .0721 .0869 .1038 .1230 .1446 .1685 .1949 .2236 .2546 .2877 .3228 .3594 .3974 .4364 .4761 .07 .0003 .0004 .0005 .0008 .0011 .0015 .0021 .0028 .0038 .0051 .0068 .0089 .0116 .0150 .0192 .0244 .0307 .0384 .0475 .0582 .0708 .0853 .1020 .1210 .1423 .1660 .1922 .2206 .2514 .2843 .3192 .3557 .3936 .4325 .4721 .08 .09 ............................................................................................................................... ........................................ .0003 .0002 .0004 .0003 .0005 .0005 Probability .0007 .0007 .0010 .0010 Table .0014 .0014 entry for z is the area .0020 under the .0019 standard normal curve .0027 .0026 to the left of z. .0037 .0036 z .0049 .0048 .0066 .0064 TABLE A Standard normal probabilities ............................................................................................................................... ........................................ .0087 .0084 z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 .0113 .0110 −3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002 .0146 .0143 −3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003 .0188 .0183 −3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005 .0239 .0233 −3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007 .0301 .0294 −3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010 .0375 .0367 −2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014 .0465 .0455 −2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019 .0571 .0559 −2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026 .0694 .0681 −2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036 −2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048 .0838 .0823 −2.4 .0985 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064 .1003 −2.3 .1170 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084 .1190 −2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110 .1401 .1379 −2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143 .1635 .1611 −2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183 .1894 .1867 −1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 .2177 .2148 −1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294 .2483 .2451 −1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367 −1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455 .2810 .2776 −1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559 .3156 .3121 −1.4 .3483 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681 .3520 −1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823 .3897 .3859 −1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985 .4286 .4247 −1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170 .4681 .4641 −1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379 −0.9 −0.8 −0.7 −0.6 .1841 .2119 .2420 .2743 .1814 .2090 .2389 .2709 .1788 .2061 .2358 .2676 .1762 .2033 .2327 .2643 .1736 .2005 .2296 .2611 .1711 .1977 .2266 .2578 .1685 .1949 .2236 .2546 .1660 .1922 .2206 .2514 .1635 .1894 .2177 .2483 .1611 .1867 .2148 .2451 How would you test if there were  three types of music played?  What if there were three types of wine  and three types of music?  Music  41% x 75  Wine  French  Italian  Other  39  41% x 84  41% x 84  % of People Buying each Wine Type 30.5  9.6  34.9  30  34.2  10.7  39.1  30  34.2  10.7  39.1  99  41%  (99/243)  13%   (31/243)  46%  (113/243)  1  19  11  31  35  35  43  113  46% x 75  French  Italian  None  Total  75  84  84  243  46% x 84  Compute Test Score (Chi‐Square)  The chi‐square sta+s+c (χ2) is a measure of how much the observed  cell counts in a two‐way table diverge from the expected cell counts.  The formula for the χ2 sta+s+c is (summed over all r * c cells in the  table)  Large values for χ2 represent strong devia+ons from the expected distribu+on  under the  H0, providing evidence against  H0. However, since  χ2 is a sum, how  large a  χ2 is required for sta+s+cal signiﬁcance will depend on the number of  comparisons made.   Compute Test Score (Chi‐Square)  H0: No relaHonship between music and wine       Ha: Music and wine are related   Observed counts            Expected counts  We calculate nine X2  components and sum them to  produce the X2 sta+s+c:  Iden+fy Test Sta+s+c: χ2  Distribu+on The χ2 distribu+ons are a family of distribu+ons that can take only posi+ve  values, are skewed to the right, and are described by a speciﬁc degrees of  freedom.   Table F gives upper cri+cal  values for many χ2  distribu+ons.   Compute p‐Value  For the chi‐square test, H0 states that there is no associa+on between the row  and  column  variables  in  a  two‐way  table.  The  alterna+ve  is  that  these  variables are related.  If H0 is true, the chi‐square test has approximately a χ2 distribuHon with (r − 1) (c − 1) degrees of freedom.   The p‐value for the chi‐square test is  the area to the right of χ2 under the  χ2 distribu+on with df (r−1)(c−1):   P(χ2 ≥ X2).  χ2 Table  df = (r−1)(c−1)  Ex: In a 4x3  table, df =  3*2 = 6  If χ2 = 16.1,  the p‐value  is between  0.01−0.02.   Compute p‐Value  H0: No rela+onship between music and wine       Ha: Music and wine are related   We found that the X2 sta+s+c under H0 is 18.28.   The two‐way table has a 3x3 design (3 levels of music  and 3 levels of wine).Thus, the degrees of freedom for  the X2 distribu+on for this test is:   (r – 1)(c – 1) = (3 – 1)(3 – 1) = 4    16.42 < X2 =18.28 < 18.47   0.0025 > p‐value > 0.001   very signiﬁcant  There is a signiﬁcant rela+onship between the type of music played and wine  purchases in supermarkets.  8 Two Way Tables: Chi‐Square Test  χ= p p ˆ p(39.5−30)2 2 2 =( χ (H0= pr χ2 (p2 .5−0r − 1)(c − 1)) df :−44− αdf c=α1)) (35−39.8) French 3 French (43 ( 1 − 1)( .5)39 α . . . 39.8 . . . ... 44.5(35−39.8)Oij −Eij )2 (Oij −r= )2 ( 2 2 χ2 Eij =r0 Eij 147 None Other χ = 2 Ha : p1r−ij 39.8 (43−44.5) E p2 c 2 44.5 Music (43 B  2 (35.3−39)2 −44.5) p c Wine French None Total p 1 44.5 c 35 χ2 ( = − 69 Frenchdf 39 (r 30 1)(c − 1)) .3 2 1 2.5−30)2 (Oij −Eij ) Other 35 43 78 (39 α 2 (df = (r − 1)(c − 1)) χ α Eij Total 74 73 147 39.5 A  2 None French (39.5−30) 1 French 39.5 Other 44.5 2 39.8 p44..3 35 5 χ (df = (r − 1)(c − 1)) (35−39.r 2 8)2 χ 39.8 p1 ˆ p2 ˆ H0 : A not related to B Ha : A is related to B (35.3−39) 35.3 2 r c = (Oij −Eij )2 Eij c (43−44.5)2 p 44.5 α 1 χ2 (df = (r − 1)(c − 1)) χ2 ∗ χ= 2 (Oij −Eij )2 Eij (39.5−30)2 39.5 (35−39.8)2 39.8 (43−44.5)2 44.5 χ2 (df = (r − 1)(c − 1)) χ2 = p (Oij −Eij )2 Eij Interpre+ng Chi‐Square Output  41% x 75  Expected Counts French  Italian  Other  39  30.5  9.6  34.9  30  34.2  10.7  39.1  30  34.2  10.7  39.1  99  41%  13%  46%  1  19  11  31  35  35  43  113  French  Italian  None  Total  75  84  84  243  Interpre+ng Chi‐Square Output  The  values  summed  to  make  up  χ2  are  called  the  χ2  components.  When the test is sta+s+cally signiﬁcant, the  largest  components point  to the condi+ons most diﬀerent from the expecta+ons based on H0.  Music and wine purchase decision  X2 components  Two chi‐square components contribute  most to the X2 total  the largest eﬀect is  for sales of Italian wine, which are  strongly aﬀected   by Italian and French   music.   0.5209     2.3337       0.5209   0.0075     7.6724       6.4038   0.3971     0.0004       0.4223  Actual propor+ons show that Italian  music helps sales of Italian wine, but  French music hinders it.  What if we conducted a Chi‐Square  Test for the ﬁrst situa+on with two  wines and two types of music?  Music  47% x 74  Wine  French  39  47% x 73  % of People Buying each Wine Type 34.7  30  34.3  69  47%  (69/147)  Other  35  39.3  43  38.7  78  53%  (78/147)  53% x 74  French  None  Total  74  73  147  53% x 73  Response Variable Explanatory Variable 1 sample None 2 samples Independent >2 samples Pooled 2-prop. z-test, Pooled Two Way Tables 7 Unpooled 2-prop. z-test, Unpooled 6 1-prop. z-test 5 Categ. Categ. 8 11  Quant. Logistic Regression Chi‐Square Test: Summary  The chi-square test is an overall technique for comparing any number of population proportions, testing for evidence of a relationship between two categorical variables. We can either:   Test for Independence: Take one random sample and classify the individuals in the sample according to two categorical variables (attribute or condition) observational study, historical design.   Compare Several Populations: Randomly select several random samples each from a different population (or from a population subjected to different treatments) experimental study. Both models use the chi-square test to test of the hypothesis of no relationship. Comparing Several Popula+ons: χ2 Test  Select  independent  SRSs  from  each  of  c  popula+ons,  of  sizes  n1,  n2,  .  .  .  ,  nc.  Classify  each  individual  in  a  sample  according  to  a  categorical  response  variable  with  r  possible  values.  There  are  c  diﬀerent  probability  distribu+ons, one for each popula+on.  The  null  hypothesis  is  that  the  distribu+ons  of  the  response  variable  are  the  same  in  all  c  popula+ons.  The  alterna+ve  hypothesis  says  that  these  c  distribu+ons  are  not all the same.  When is it safe to use the χ2 test?   The samples are simple random samples (SRS).  All individual expected counts are 1 or more.  No more than 20% of expected counts are less than 5 For a 2x2 table, this implies that all four expected counts should be 5 or more. Marginal and Condi+onal Distribu+ons  Is there a rela+onship between Age  and Level of Educa+on aiained?  Marginal Distribu+ons  Condi+onal Distribu+ons  Does background music in  supermarkets inﬂuence customer  purchasing decisions?  Wine purchased for each kind of music  played (column percents)  Music played for each kind of wine  purchased (row percents)   A Paradox!!  Is there discrimina+on?  How would you test this Hypothesis?  Is there discrimina+on really….?  Simpson’s Paradox!!!!  Slope = # successful / # unsuccessful = odds 33  Slope = # successful / # unsuccessful = odds 34  35  36  Simpson’s Paradox  An associa+on or comparison that holds for all of several groups can reverse  direc+on  when  the  data  are  combined  (aggregated)  to  form  a  single  group.  This reversal is called Simpson’s paradox.  Example: Hospital death  rates  On the surface,  Hospital B would  seem to have a beier  record.  But once pa+ent   condi+on is taken   into account, we   see that hospital A   has in fact a beier   record for both pa+ent condi+ons (good and poor).   Here pa+ent condi+on was the lurking variable.  Moore-212007 pbs November 20, 2007 13:52 T-20 Chi-Square Distribution Table TABLES df = (r−1)(c−1)  Table entry for p is the ∗ critical value χ 2 with probability p lying to its right. Probability p (χ 2)* TABLE F χ 2 distribution critical values df 1 2 3 4 5 6 7 8 9 10 11 12 13 .25 1.32 2.77 4.11 5.39 6.63 7.84 9.04 10.22 11.39 12.55 13.70 14.85 15.98 .20 1.64 3.22 4.64 5.99 7.29 8.56 9.80 11.03 12.24 13.44 14.63 15.81 16.98 .15 2.07 3.79 5.32 6.74 8.12 9.45 10.75 12.03 13.29 14.53 15.77 16.99 18.20 ............................................................................................................................... ............................................................................................................................ Tail probability p .10 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 .05 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 .025 5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 .02 5.41 7.82 9.84 11.67 13.39 15.03 16.62 18.17 19.68 21.16 22.62 24.05 25.47 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 .005 7.88 10.60 12.84 14.86 16.75 18.55 20.28 21.95 23.59 25.19 26.76 28.30 29.82 .0025 9.14 11.98 14.32 16.42 18.39 20.25 22.04 23.77 25.46 27.11 28.73 30.32 31.88 .001 10.83 13.82 16.27 18.47 20.51 22.46 24.32 26.12 27.88 29.59 31.26 32.91 34.53 .0005 12.12 15.20 17.73 20.00 22.11 24.10 26.02 27.87 29.67 31.42 33.14 34.82 36.48 ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online