and sampling is random, sample means (
ܺ
ത
) will be
Normally distributed (Central Limit Theorem)
test statistic is
ݖ =
௫̅ିఓ
ൗ
(
௦ ି௬௧௦
௦௧ௗௗ
)
can substitute
s
instead of
and replace
௫̅ିఓ
ൗ
with
௫̅ିఓ
ೞ
ൗ
test statistic is now
Z,
hence conducting a
Z
-test (using Normal dist)
•
When
, can also substitute
s
instead of
and use
appropriate
t
-distribution
but only if sampling from a Normally distributed population
•
As
n
increases (and therefore
increases), the
t
-distribution
approaches the Normal distribution
so
t
-tests and
Z
-tests give increasingly similar results
28

Using a sample mean or median to test a hypothesis about
population central location,
when
is unknown and sampling is random
is sample
size large?
is sampling
from Normally
dist.
population?
no
no
yes
(CLT)
Z
-test (can also use
t
-test)
Use
s
instead of
ߪ
Sign Test
yes
29
t
-test
Use
s
instead of
ߪ

Section C
(Weeks 8 – 11)
Two variable analysis
30

Chi-square test
(2 cat. variables)
•
When working with two variables, our motivating question will
generally be
“is there an association between the variables?”
•
The chi-square
ଶ
test is used to detect the presence of an
association between two
categorical
variables
•
The
ଶ
distribution is asymmetrical and skewed to the right
•
Expected values are calculated by multiplying the appropriate row
total by the column total and dividing by the total number of
observations
ோ×
•
Hypotheses when performing a chi-square test will be:
ܪ
:
there is
no
association between variables
ܪ
ଵ
:
there
is
an association
31

Chi-square test
•
If there were perfect agreement between the
observed
and
expected
frequencies,
then we would expect the
ଶ
test statistic to equal zero
•
The more the observed and expected frequencies disagree, the greater the value
of the test statistic
•
A
ଶ
test is a one-tailed test
•
The degrees of freedom for the
ଶ
test are calculated as
,
where
is no. of rows and
is no. of columns
•
Necessary conditions for you to check when conducting a
ଶ
test are
1.
observations must be independent
2.
each observation must appear only once
•
Association does not necessarily imply causation
32

ANOVA test
(1 cat. and 1 num. variable)
•
To formally test whether there is association between a
categorical and a numerical variable, we use an
ANOVA test
ଵ
ଶ
ଷ
ଵ
•
Test statistic is
F
, the ratio of the between-groups variance
divided by the within-groups variance
=
•
‘df numerator’ is
−1
(
is no. of groups)
•
‘df denominator’ is
−
(
is sample size)
33

Confidence intervals for difference between
two means
•
Can use confidence intervals to obtain more information about
the difference between two specific groups of an ANOVA test
•
Using confidence interval for
to test
vs
ଵ
if confidence interval includes zero, then we have no evidence of
a significant difference between the group (population) means
do not reject
(difference could feasibly be zero)
if the confidence interval range is purely positive
or
negative
numbers, we have statistical evidence of a difference between
those groups
reject
and accept
ଵ
34