Review for Final Exam

# Review for Final Exam - Review for Final Exam COMM 291...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Review for Final Exam COMM 291 Section 202 Monday, April 19, 2010 Greg Werker 1 Announcements Office hours this week: Monday 1:30 - 3:00 Thursday 10:30 - 12:00 & 12:30 - 1:30 Final Exam: Friday, April 23, 12:00 - 3:00 HA 098 Open book and open notes No computers and no wireless/cell devices 2 2 3 3 x x x x Central Limit Theorem N , n x ... if n is large enough 4 Descriptive Statistics Data Inference 5 5 Descriptive Statistics Data, variable types Surveys, sampling Populations and parameters Displaying and describing: categorical data quantitative data 6 6 Descriptive Statistics Data Inference 7 7 Inference Use a sample... ...to infer something... ...about a population. 8 8 Parameters vs. Statistics x 9 9 Hypothesis Tests 1. State the hypotheses 2. Test statistic 3. P-value 4.Conclusion 10 10 Confidence Intervals estimate margin of error must also state confidence level C 11 11 Inference: One Variable Are we interested in a mean or a proportion? 12 12 One Variable Tests # Groups One Quantitative One-sample t Mean Categorical One-sample z Proportion 13 13 One-sample t-test for the mean 1.Hypotheses H0 : = 0 2.Test statistic: y - 0 tn-1 = SE() y HA : = 0 HA : < 0 HA : > 0 s where SE() = y n 3.P-value (remember to double if two-sided) 4.Conclusion (as always, reject H0 if P < ) 14 14 t Distributions There is a t(k) distribution for every positive degrees of freedom k t(k) distribution Normal distribution as k increases t distributions are symmetric For the hypothesis tests and CI's in chapter 12: df = n - 1 15 15 oncepts from Wednesday"s for a population proportion this 8.1 if Significance test class... (make sure stuff!zyou missed Wednesday!s class) you review d Wednesday!s class) ! Note: This test follows logically from what we!ve done so far, but is 8.1 !z a population test for a ce test forSignificanceproportion population proportion in practice. It is what more interesting but isn!t that done ! Note: This frommuch we!ve done so far, to compare proportions from follows logically test follows logically from what we!ve useful so far, b populations It see if they!re different) than compare proportions in more interesting to compare proportions toto two s muchpractice. (to is much more interesting from hypothesize about w proportion might be for a single population. seepopulations (to test:if they!re different) than tothe they!re different) than ifHypothesis see to hypothesize about what hypothesize abo A. proportion test: ht beHypothesis might be for a single population. for a single population. Ha : p = p0 or Ha : p > p0 or Ha t: 1. H0 : p = p0 ; 1. A. Hypothesis test: z-test for a population proportion Ha statistic: 2. Test::p = p0 0 ;or 1. H0 p = p 0; c: 2 2 as always, reject H0 if P ! ! z p (1 - n = ) a sample sizep ) B.p Choosing (1 - p 1. 3. P-value: p0 (1-p0 ) n remember to double the P-value if using the two-sided test o double the P-value if using the two-sided test 3. Conclusion: 4. P-value: 3.P-value (remember if the P-valuetwo-sided) two-sided test remember to double P ! ! as P ! ! reject H0 to double if if using the always, eject H0 if 4. Conclusion: B. 4.Conclusion (as always, reject H0 if P < ) mpleChoosing a sample size size 1-p0 ) n p - p0 ^ 2. z = p0 2. Test statistic: Ha : p > = p or or a : p < p0 > p Ha : p p0 0 H Ha : p 0 or z= p0 (1-p0 ) p - p0 ^ n m 2 z 16 16 Inference: One Variable Are we interested in a or a Proportion ? Mean Is there one group? Are there two? More? 17 17 One Variable Tests # Groups One Two Quantitative One-sample t Two-sample t (unpooled) Categorical One-sample z Two-sample z Pooled t Paired t 18 18 t-Test Tips Are the data paired? No Can we assume 1 = 2? No Yes Pooled a.k.a. equal variance Yes Paired Two-sample unequal variance 19 a.k.a. unpooled; 19 Each observation in group 1 Are the data paired? can be matched with one in group 2 (so n1 = n2) e.g., twins or before/after The sample standard Can we assume 1 = 2? deviations don't differ by more than a factor of 2 (or sample variances by more than a factor of 4) 20 20 Two-sample t-Test df: Use Excel, or take smaller of (n1 - 1) and (n2 - 1) 21 21 Pooled t-Test 22 22 Paired t-Test 23 23 failures in both samples are at least 10. Otherwise you can't use a large-sample confidence interval and we don't cover that in this course (see the "plus four" confidence interval in your textbook if interested). E. Significance test for comparing two proportions: 1. H0 : p1 = p2 ! (can use one- or two-sided Ha as usual) 2. Test statistic: " First calculate these: ! pooled estimate: z-Test for Comparing Two Proportions X1 + X2 p= ^ n1 + n2 ! pooled standard error: SEDP = p(1 - p) ^ ^ " Then calculate the test statistic: 1 1 + n1 n2 p1 - p2 ^ ^ z= SEDP ! ! 24 24 Significance Test for ! Comparing Two Proportions ! p1 - p2 ^ ^ z= SEDP 3. P-value: remember to double the P-value if using the two-sided test 4. Conclusion: as always, reject H0 if P ! ! # Use this test only when the number of successes and the number of failures in each of the samples is at least 5. F. Example "if time. " Then calculate the test statistic: eview objectives p next: Relations in categoric data 25 25 One Variable Tests # Groups One Two Quantitative One-sample t Two-sample t (unpooled) Categorical One-sample z Two-sample z Pooled t Paired t Two or more ANOVA Chi-squared 26 26 ANOVA Table Source DF Sum of Squares SST k 2 Mean Square F-Statistic P-value / (FDIST or table F) Treatment k 1 (between) Error (within) Total Nk SST MST = ni (i - y ) M ST = y k - 1 MSE i=1 k i=1 SSE = (ni - 1)s2M SE = i SSE N -k N1 SSTotal = SST + SSE Note: Excel switches columns around; also, Treatment = Factor. 27 27 F-test in One-Way ANOVA 1.Hypotheses H0: 1 = 2 = ... = k HA: at least one mean is different 2.Test Statistic F = MST / MSE 3.P-value =FDIST(F,k 1,N k) or from Table F 4.Conclusion if P < then reject H0 28 28 Chi-Square Test 1. H0: There is no association between the row and column variables. 2. Test statistic: 3. P-value: (Obs - Exp)2 2 = Exp all cells 4. Conclusion: Reject H0 if P . Then explain in plain language. 29 29 Examples Homogeneity BC Ski Snowboard Neither Ont. Que. Guitar Piano This model tests if one categorical variable is different across several Ski populations. Snowboard Neither Examines distribution of counts of one variable on Neither multiple groups 30 5 30 Chi-Square Distribution There is a 2 distribution for every positive df. The probability p is found by looking up the area of the tail There is only one tail ... and it is always on the right 31 31 One Variable CIs? # Groups One Two Quantitative One-sample t Two-sample t (unpooled) Categorical One-sample z Two-sample z Pooled t Paired t Two or more ANOVA Chi-squared 32 32 CI for a mean Assumption about : unknown; use s instead Confidence interval: s xt n t* is found in a table or with Excel 33 33 CI for a proportion p z SE(^) ^ p 34 34 CI: two means (unpooled) 35 35 CI: two means (pooled) 36 36 CI: two means (paired) 37 37 Interval D. (Large-sample) z confidence interval for comparing two for Comparing Two Proportions proportions: 1. estimate the difference in proportions: p1 (1 - p1 ) p2 (1 - p2 ) D = + n n2 Large Sample 1z Confidence X1 X2 D = p 1 - p2 = ^ ^ - n1 n2 2. standard of the difference: error SED = p1 (1 - p1 ) p2 (1 - p2 ) ^ ^ ^ ^ + n1 n2 3. margin of error: m = z SED 4. So, confidence interval is: ! Use only if thesuccesses successes and the number of Use only if # number of and # failures of both failures in both samples are at least 10. Otherwise you can't 38 samples are at least 10. use a large-sample confidence interval and we don't cover (^1 - p2 ) m p ^ 38 Errors 39 39 Inference: Multiple Variables Do we have quantitative variables or categorical variables? Quantitative Regression Categorical Chi-Squared 40 40 Regression 0 = (not interesting) 1 = ? !"#\$% +,-%+./('0% '""# &\$"# &""# %\$"# %""# \$\$"# \$""# !\$"# !""# !""# !\$"# \$""# \$\$"# %""# %\$"# &""# &\$"# '""# &'()"*% Inference for regression includes: CI for slope t-test for slope t-test for correlation coefficient, r CI for predicted mean value Prediction interval for a future observation 41 41 CI for the Regression Slope C level confidence interval for b1: b1 t SE(b1 ) where t* comes from the t-distribution with n - 2 degrees of freedom for level C. Note that SE(b1) is the same as for the t-test. 42 42 t-Test for the Regression Slope To test H0: 1 = 0, first calculate: se = and: then the test statistic is: se SE(b1 ) = sx n - 1 b1 - 0 t= SE(b1 ) (y - y )2 ^ = n-2 e2 n-2 df = n - 2 43 43 t-Test for the Correlation Coefficient To test H0: = 0, use the test statistic: t=r n-2 1 - r2 As with the t-procedures for b1, df = n - 2 44 44 Intro to CI and PI +,-%+./('0% *"""# )""# (""# !"#\$% '""# &""# %""# \$""# !""# \$""# \$%"# %""# %%"# &""# &%"# '""# '%"# 45 45 (""# &'()"*% Confidence Interval for the Predicted Mean Value We must calculate the CI for the predicted mean value, v, at a specific ^ value of x, called xv: yv t SE(^v ) ^ where SE(^v ) = df = n - 2 s2 SE 2 (b1 ) (xv - x)2 + e n 46 46 Prediction Interval for a Future Observation As with the CI, we calculate the PI for a future observation, yv, at a specific value ^ of x, called xv: yv t SE(^v ) ^ y where SE(^v ) = y df = n - 2 s2 SE 2 (b1 ) (xv - x)2 + e + s2 e n 47 47 Residual Plots Outlier a statistical observation that is markedly different in value from the others of the sample (m-w.com) High Leverage point a point with an x-value far from x Influential point a point is influential if omitting it gives a very different model 48 48 Multiple Regression More than one X variable procedures for b1 now work for bi 49 49 t-Test for a Coefficient To test H0: 1i = 0, first calculate: (y - y )2 ^ se = = n-k-1 and: se SE(b1 ) = i sx i n - 1 then the test statistic is: e2 n-k-1 b1 - 0 i t= SE(b1 ) i df = n - k - 1 50 CI for a Coefficient C level confidence interval for b1: i b1 t SE(b1 ) i i where t* comes from the t-distribution with n - k - 1 degrees of freedom for level C. 51 51 F-test for Regression 1.Hypotheses H0: 1 = 2 = ... = k = 0 HA: at least one of the coefficients is not zero 2.Test Statistic 3.P-value R2 /k M SR = F = 2 )/(n - k - 1) (1 - R M SE =FDIST(F,k,n-k-1) or from Table F 4.Conclusion if P < then reject H0 52 52 Inference: Multiple Variables Do we have quantitative variables or categorical variables? Quantitative Regression Categorical Chi-Squared 53 53 Chi-Squared Same test as for homogeneity (one variable across several populations), but different interpretation... variables and one population two 54 54 Snowboard Neither Independence Guitar Piano Neither Ski Snowboard Neither 5 This model tests if two categorical variables, applied to one population, are independent. Examines distribution of counts of two variables measuring one group 55 5 55 Hypotheses What is H0 for the independence model? BC Ont. Que. Examples H0: Snow sport and musical instrument Snowboard Neither are independent. Guitar Piano Ski Snowboard Neither 5 Ski Neither What is H0 for the homogeneity model? H0: Snow sport distribution is the same in different provinces. Examples BC Ski Snowboard Neither Guitar Piano Ski Neither Ont. Que. 5 56 56 Hypotheses in General Independence model... H0: The two categorical variables are independent. Homogeneity model... H0: The categorical variable has the same distribution across all populations/groups. 57 57 Open-book Exam Tips Cheat sheet(s) incl. table of all procedures (e.g., A-56) Post-it notes in textbook? For the most part, prepare as you would for a closed-book exam... do lots of practice questions review the notes & text know the material but you don't need to memorize details 58 58 ...
View Full Document

## This note was uploaded on 04/29/2010 for the course COMM 291 taught by Professor E.fowler during the Spring '10 term at UBC.

Ask a homework question - tutors are online