And sampling is random sample means ത will be

• 43

This preview shows page 28 - 35 out of 43 pages.

and sampling is random, sample means ( ܺ ) will be Normally distributed (Central Limit Theorem) test statistic is ݖ = ௫̅ିఓ ( ௦௔௠௣௟௘ ௠௘௔௡ି௛௬௣௢௧௛௘௦ ௠௘௔௡ ௦௧௔௡ௗ௔௥ௗ ௘௥௥௢௥ ) can substitute s instead of and replace ௫̅ିఓ with ௫̅ିఓ test statistic is now Z, hence conducting a Z -test (using Normal dist) When , can also substitute s instead of and use appropriate t -distribution but only if sampling from a Normally distributed population As n increases (and therefore increases), the t -distribution approaches the Normal distribution so t -tests and Z -tests give increasingly similar results 28
Using a sample mean or median to test a hypothesis about population central location, when is unknown and sampling is random is sample size large? is sampling from Normally dist. population? no no yes (CLT) Z -test (can also use t -test) Use s instead of ߪ Sign Test yes 29 t -test Use s instead of ߪ
Section C (Weeks 8 – 11) Two variable analysis 30
Chi-square test (2 cat. variables) When working with two variables, our motivating question will generally be “is there an association between the variables?” The chi-square test is used to detect the presence of an association between two categorical variables The distribution is asymmetrical and skewed to the right Expected values are calculated by multiplying the appropriate row total by the column total and dividing by the total number of observations ோ×஼ Hypotheses when performing a chi-square test will be: ܪ : there is no association between variables ܪ : there is an association 31
Chi-square test If there were perfect agreement between the observed and expected frequencies, then we would expect the test statistic to equal zero The more the observed and expected frequencies disagree, the greater the value of the test statistic A test is a one-tailed test The degrees of freedom for the test are calculated as , where is no. of rows and is no. of columns Necessary conditions for you to check when conducting a test are 1. observations must be independent 2. each observation must appear only once Association does not necessarily imply causation 32
ANOVA test (1 cat. and 1 num. variable) To formally test whether there is association between a categorical and a numerical variable, we use an ANOVA test Test statistic is F , the ratio of the between-groups variance divided by the within-groups variance ࡲ = ࢜ࢇ࢘࢏ࢇ࢔ࢉࢋ ࢈ࢋ࢚࢝ࢋࢋ࢔ ࢍ࢘࢕࢛࢖࢙ ࢜ࢇ࢘࢏ࢇ࢔ࢉࢋ ࢝࢏࢚ࢎ࢏࢔ ࢍ࢘࢕࢛࢖࢙ ‘df numerator’ is −1 ( is no. of groups) ‘df denominator’ is ( is sample size) 33
Confidence intervals for difference between two means Can use confidence intervals to obtain more information about the difference between two specific groups of an ANOVA test Using confidence interval for to test vs if confidence interval includes zero, then we have no evidence of a significant difference between the group (population) means do not reject (difference could feasibly be zero) if the confidence interval range is purely positive or negative numbers, we have statistical evidence of a difference between those groups reject and accept 34
• • • 