Unformatted text preview: Review for Final Exam
COMM 291 Section 202 Monday, April 19, 2010 Greg Werker 1 Announcements Office hours this week: Monday 1:30  3:00 Thursday 10:30  12:00 & 12:30  1:30 Final Exam: Friday, April 23, 12:00  3:00 HA 098 Open book and open notes No computers and no wireless/cell devices
2 2 3 3 x x x x Central Limit Theorem
N , n x ... if n is large enough
4 Descriptive Statistics Data
Inference
5 5 Descriptive Statistics Data, variable types Surveys, sampling Populations and parameters Displaying and describing: categorical data quantitative data 6 6 Descriptive Statistics Data
Inference
7 7 Inference
Use a sample... ...to infer something... ...about a population. 8 8 Parameters vs. Statistics x 9 9 Hypothesis Tests
1. State the hypotheses 2. Test statistic 3. Pvalue 4.Conclusion 10 10 Confidence Intervals estimate margin of error must also state confidence level C 11 11 Inference: One Variable Are we interested in a mean or a proportion? 12 12 One Variable Tests
# Groups One Quantitative Onesample t Mean Categorical Onesample z Proportion 13 13 Onesample ttest for the mean
1.Hypotheses H0 : = 0 2.Test statistic: y  0 tn1 = SE() y
HA : = 0 HA : < 0 HA : > 0 s where SE() = y n 3.Pvalue (remember to double if twosided) 4.Conclusion (as always, reject H0 if P < )
14 14 t Distributions
There is a t(k) distribution for every positive degrees of freedom k t(k) distribution Normal distribution as k increases t distributions are symmetric For the hypothesis tests and CI's in chapter 12: df = n  1
15 15 oncepts from Wednesday"s for a population proportion this 8.1 if Significance test class... (make sure stuff!zyou missed Wednesday!s class) you review d Wednesday!s class) ! Note: This test follows logically from what we!ve done so far, but is 8.1 !z a population test for a ce test forSignificanceproportion population proportion in practice. It is what more interesting but isn!t that done ! Note: This frommuch we!ve done so far, to compare proportions from follows logically test follows logically from what we!ve useful so far, b populations It see if they!re different) than compare proportions in more interesting to compare proportions toto two s muchpractice. (to is much more interesting from hypothesize about w proportion might be for a single population. seepopulations (to test:if they!re different) than tothe they!re different) than ifHypothesis see to hypothesize about what hypothesize abo A. proportion test: ht beHypothesis might be for a single population. for a single population. Ha : p = p0 or Ha : p > p0 or Ha t: 1. H0 : p = p0 ; 1. A. Hypothesis test: ztest for a population proportion
Ha statistic: 2. Test::p = p0 0 ;or 1. H0 p = p 0; c: 2 2 as always, reject H0 if P ! ! z p (1  n = ) a sample sizep ) B.p Choosing (1  p
1. 3. Pvalue: p0 (1p0 ) n remember to double the Pvalue if using the twosided test o double the Pvalue if using the twosided test 3. Conclusion: 4. Pvalue: 3.Pvalue (remember if the Pvaluetwosided) twosided test remember to double P ! ! as P ! ! reject H0 to double if if using the always, eject H0 if 4. Conclusion: B. 4.Conclusion (as always, reject H0 if P < ) mpleChoosing a sample size size 1p0 ) n p  p0 ^ 2. z = p0 2. Test statistic: Ha : p > = p or or a : p < p0 > p Ha : p p0 0 H Ha : p 0 or z= p0 (1p0 ) p  p0 ^ n m 2 z 16 16 Inference: One Variable Are we interested in a or a Proportion ? Mean Is there one group? Are there two? More? 17 17 One Variable Tests
# Groups One Two Quantitative Onesample t Twosample t
(unpooled) Categorical Onesample z Twosample z Pooled t Paired t 18 18 tTest Tips
Are the data paired? No Can we assume 1 = 2? No Yes Pooled
a.k.a. equal variance Yes Paired Twosample
unequal variance
19 a.k.a. unpooled; 19 Each observation in group 1 Are the data paired? can be matched with one in group 2 (so n1 = n2) e.g., twins or before/after The sample standard Can we assume 1 = 2?
deviations don't differ by more than a factor of 2 (or sample variances by more than a factor of 4)
20 20 Twosample tTest df: Use Excel, or take smaller of (n1  1) and (n2  1) 21
21 Pooled tTest 22 22 Paired tTest 23 23 failures in both samples are at least 10. Otherwise you can't use a largesample confidence interval and we don't cover that in this course (see the "plus four" confidence interval in your textbook if interested). E. Significance test for comparing two proportions: 1. H0 : p1 = p2 ! (can use one or twosided Ha as usual) 2. Test statistic: " First calculate these: ! pooled estimate: zTest for Comparing Two Proportions X1 + X2 p= ^ n1 + n2 ! pooled standard error: SEDP = p(1  p) ^ ^ " Then calculate the test statistic: 1 1 + n1 n2 p1  p2 ^ ^ z= SEDP
! ! 24 24 Significance Test for ! Comparing Two Proportions !
p1  p2 ^ ^ z= SEDP
3. Pvalue: remember to double the Pvalue if using the twosided test 4. Conclusion: as always, reject H0 if P ! ! # Use this test only when the number of successes and the number of failures in each of the samples is at least 5. F. Example "if time. " Then calculate the test statistic: eview objectives p next: Relations in categoric data
25 25 One Variable Tests
# Groups One Two Quantitative Onesample t Twosample t
(unpooled) Categorical Onesample z Twosample z Pooled t Paired t Two or more ANOVA Chisquared
26 26 ANOVA Table
Source DF Sum of Squares
SST
k 2 Mean Square FStatistic Pvalue / (FDIST or table F) Treatment k 1 (between) Error (within) Total Nk SST MST = ni (i  y ) M ST = y k  1 MSE i=1
k i=1 SSE = (ni  1)s2M SE = i SSE N k N1 SSTotal = SST + SSE Note: Excel switches columns around; also, Treatment = Factor.
27 27 Ftest in OneWay ANOVA
1.Hypotheses H0: 1 = 2 = ... = k HA: at least one mean is different 2.Test Statistic F = MST / MSE 3.Pvalue =FDIST(F,k 1,N k) or from Table F 4.Conclusion if P < then reject H0
28 28 ChiSquare Test
1. H0: There is no association between the row and column variables. 2. Test statistic: 3. Pvalue: (Obs  Exp)2 2 = Exp
all cells 4. Conclusion: Reject H0 if P . Then explain in plain language. 29 29 Examples Homogeneity
BC Ski Snowboard Neither Ont. Que. Guitar Piano This model tests if one categorical variable is different across several Ski populations. Snowboard Neither Examines distribution of counts of one variable on Neither multiple groups
30 5
30 ChiSquare Distribution There is a 2 distribution for every positive df. The probability p is found by looking up the area of the tail There is only one tail ... and it is always on the right
31 31 One Variable CIs?
# Groups One Two Quantitative Onesample t Twosample t
(unpooled) Categorical Onesample z Twosample z Pooled t Paired t Two or more ANOVA Chisquared
32 32 CI for a mean Assumption about : unknown; use s instead Confidence interval: s xt n t* is found in a table or with Excel 33 33 CI for a proportion p z SE(^) ^ p 34 34 CI: two means (unpooled) 35 35 CI: two means (pooled) 36 36 CI: two means (paired) 37 37 Interval D. (Largesample) z confidence interval for comparing two for Comparing Two Proportions proportions:
1. estimate the difference in proportions: p1 (1  p1 ) p2 (1  p2 ) D = + n n2 Large Sample 1z Confidence X1 X2 D = p 1  p2 = ^ ^  n1 n2 2. standard of the difference: error SED = p1 (1  p1 ) p2 (1  p2 ) ^ ^ ^ ^ + n1 n2 3. margin of error: m = z SED 4. So, confidence interval is: ! Use only if thesuccesses successes and the number of Use only if # number of and # failures of both failures in both samples are at least 10. Otherwise you can't 38 samples are at least 10. use a largesample confidence interval and we don't cover (^1  p2 ) m p ^ 38 Errors 39 39 Inference: Multiple Variables Do we have quantitative variables or categorical variables? Quantitative Regression Categorical ChiSquared
40 40 Regression
0 = (not interesting) 1 = ? !"#$% +,%+./('0%
'""# &$"# &""# %$"# %""# $$"# $""# !$"# !""# !""# !$"# $""# $$"# %""# %$"# &""# &$"# '""# &'()"*% Inference for regression includes: CI for slope ttest for slope ttest for correlation coefficient, r CI for predicted mean value Prediction interval for a future observation 41 41 CI for the Regression Slope C level confidence interval for b1: b1 t SE(b1 )
where t* comes from the tdistribution with n  2 degrees of freedom for level C. Note that SE(b1) is the same as for the ttest.
42 42 tTest for the Regression Slope To test H0: 1 = 0, first calculate: se = and: then the test statistic is: se SE(b1 ) = sx n  1 b1  0 t= SE(b1 ) (y  y )2 ^ = n2 e2 n2 df = n  2 43 43 tTest for the Correlation Coefficient To test H0: = 0, use the test statistic: t=r n2 1  r2 As with the tprocedures for b1, df = n  2 44 44 Intro to CI and PI
+,%+./('0%
*"""# )""# (""# !"#$% '""# &""# %""# $""# !""# $""# $%"# %""# %%"# &""# &%"# '""# '%"#
45 45 (""# &'()"*% Confidence Interval for the Predicted Mean Value We must calculate the CI for the predicted mean value, v, at a specific ^ value of x, called xv: yv t SE(^v ) ^ where
SE(^v ) = df = n  2 s2 SE 2 (b1 ) (xv  x)2 + e n
46 46 Prediction Interval for a Future Observation As with the CI, we calculate the PI for a future observation, yv, at a specific value ^ of x, called xv: yv t SE(^v ) ^ y
where
SE(^v ) = y df = n  2 s2 SE 2 (b1 ) (xv  x)2 + e + s2 e n
47 47 Residual Plots Outlier a statistical observation that is markedly different in value from the others of the sample (mw.com) High Leverage point a point with an xvalue far from x Influential point a point is influential if omitting it gives a very different model
48 48 Multiple Regression More than one X variable procedures for b1 now work for bi 49 49 tTest for a Coefficient To test H0: 1i = 0, first calculate: (y  y )2 ^ se = = nk1 and: se SE(b1 ) = i sx i n  1
then the test statistic is: e2 nk1 b1  0 i t= SE(b1 ) i df = n  k  1
50 CI for a Coefficient C level confidence interval for b1: i b1 t SE(b1 ) i i
where t* comes from the tdistribution with n  k  1 degrees of freedom for level C. 51 51 Ftest for Regression
1.Hypotheses H0: 1 = 2 = ... = k = 0 HA: at least one of the coefficients is not zero 2.Test Statistic 3.Pvalue
R2 /k M SR = F = 2 )/(n  k  1) (1  R M SE =FDIST(F,k,nk1) or from Table F 4.Conclusion if P < then reject H0
52 52 Inference: Multiple Variables Do we have quantitative variables or categorical variables? Quantitative Regression Categorical ChiSquared
53 53 ChiSquared Same test as for homogeneity (one variable across several populations), but different interpretation... variables and one population two 54 54 Snowboard Neither Independence
Guitar Piano Neither Ski Snowboard Neither
5 This model tests if two categorical variables, applied to one population, are independent. Examines distribution of counts of two variables measuring one group
55 5 55 Hypotheses What is H0 for the independence model?
BC Ont. Que. Examples H0: Snow sport and musical instrument Snowboard Neither are independent.
Guitar Piano Ski Snowboard Neither
5 Ski Neither What is H0 for the homogeneity model?
H0: Snow sport distribution is the same in different provinces.
Examples
BC Ski Snowboard Neither Guitar Piano Ski Neither Ont. Que. 5 56 56 Hypotheses in General Independence model...
H0: The two categorical variables are independent. Homogeneity model...
H0: The categorical variable has the same distribution across all populations/groups. 57 57 Openbook Exam Tips Cheat sheet(s) incl. table of all procedures
(e.g., A56) Postit notes in textbook? For the most part, prepare as you would for a closedbook exam... do lots of practice questions review the notes & text know the material but you don't need to memorize details
58 58 ...
View
Full
Document
This note was uploaded on 04/29/2010 for the course COMM 291 taught by Professor E.fowler during the Spring '10 term at UBC.
 Spring '10
 E.Fowler

Click to edit the document details