**Unformatted text preview: **Instructor: Dr. Andrea Albonico Inferential statistic
Outline:
Null hypothesis
Sample size and relationship strength
Statistical significance
Null-hypothesis testing
Analysis of variance
Testing correlation coefficient
Errors in null hypothesis testing 1 Inferential statistic
Recall that Matthias Mehl and his colleagues, in their
study of sex differences in talkativeness, found that the
women in their sample spoke a mean of 16,215 words per
day and the men a mean of 15,669 words per day (Mehl,
Vazire, Ramirez-Esparza, Slatcher, & Pennebaker, 2007)
But despite this sex difference in their sample, they
concluded that there was no evidence of a sex difference
in talkativeness in the population
How comes? 2 Inferential statistic
Psychological research typically involves measuring one or
more variables in a sample and computing descriptive
summary data (e.g., means, correlation coefficients) for those
variables
These descriptive data for the sample are called statistics
In general, however, the researcher’s goal is not to draw
conclusions about that sample but to draw conclusions about
the population that the sample was selected from
Thus researchers must use sample statistics to draw
conclusions about the corresponding values in the population
These corresponding values in the population are called
parameters
3 Inferential statistic
Unfortunately, sample statistics are not perfect estimates of
their corresponding population parameters
This is because there is a certain amount of random variability in
any statistic from sample to sample
This random variability in a statistic from sample to sample is called sampling error One implication of this is that when there is a statistical
relationship in a sample, it is not always clear that there is a
statistical relationship in the population
A small difference between two group means in a sample might indicate that there is a small difference between the two group
means in the population
But it could also be that there is no difference between the means
in the population and that the difference in the sample is just a
matter of sampling error 4 Inferential statistic
In fact, any statistical relationship in a sample can be
interpreted in two ways:
There is a relationship in the population, and the relationship in
the sample reflects this
There is no relationship in the population, and the relationship in
the sample reflects only sampling error
The purpose of null hypothesis testing is simply to help
researchers decide between these two interpretations 5 Inferential statistic
Null hypothesis testing (often called null hypothesis
significance testing or NHST) is a formal approach to deciding
between two interpretations of a statistical relationship in a
sample
One interpretation is called the null hypothesis (often
symbolized H0): This is the idea that there is no relationship in
the population and that the relationship in the sample reflects
only sampling error
The other interpretation is called the alternative hypothesis
(often symbolized as H1): This is the idea that there is a
relationship in the population and that the relationship in the
sample reflects this relationship in the population 6 Inferential statistic
The steps are as follows:
Assume for the moment that the null hypothesis is true. There
is no relationship between the variables in the population
Determine how likely the sample relationship would be if the
null hypothesis were true
If the sample relationship would be extremely unlikely, then
reject the null hypothesis in favour of the alternative
hypothesis. If it would not be extremely unlikely, then retain
the null hypothesis 7 Inferential statistic
A crucial step in null hypothesis testing is finding the
probability of the sample result or a more extreme result if
the null hypothesis were true
P-value A low p-value means that the sample or more extreme result
would be unlikely if the null hypothesis were true and leads to
the rejection of the null hypothesis.
A high p-value that is not low means that the sample or more
extreme result would be likely if the null hypothesis were true
and leads to the retention of the null hypothesis 8 Inferential statistic
In null hypothesis testing, the criterion is called α (alpha) and
is almost always set to .05
If there is a 5% chance or less of a result at least as extreme as
the sample result if the null hypothesis were true, then the
null hypothesis is rejected
When this happens, the result is said to be statistically
significant
If there is greater than a 5% chance of a result as extreme as the
sample result when the null hypothesis is true, then the null
hypothesis is retained
This does not necessarily mean that the researcher accepts the
null hypothesis as true—only that there is not currently enough
evidence to reject it
9 Inferential statistic 10 Inferential statistic
What is the p-value?
It can be helpful to see that the answer to this question
depends on just two considerations: the strength of the
relationship and the size of the sample
The stronger the sample relationship and the larger the
sample, the less likely the result would be if the null
hypothesis were true 11 Inferential statistic
How relationship strength and ample sir
combine to determine whether a result is
statistically
significant
Relationship strength
Sample Size
weak medium strong
Small (N=20)
no
no
maybe
Medium (N=50)
no
no
yes
Large (N=100)
maybe yes
yes
Extra large (N=500) yes
yes
yes 12 Inferential statistic
A statistically significant result is not necessarily a strong one
Even a very weak result can be statistically significant if it is based on a
large enough sample This is why it is important to distinguish between the statistical
significance of a result and the practical significance of that result
Practical significance refers to the importance or usefulness of the
result in some real-world context 13 Inferential statistic
In null hypothesis testing, the researcher tries to draw a
reasonable conclusion about the population based on the
sample RESEARCHER'S DECISION
TRUE STATE OF NATURE REJECT THE NULL HYPOTHESIS RETAIN THE NULL HYPOTHESIS
Null hypothesis is true Type I error Correct decision Null hypothesis is false Correct decision Type II error 14 Inferential statistic
In null hypothesis testing, the researcher tries to draw a
reasonable conclusion about the population based on the
sample RESEARCHER'S DECISION
TRUE STATE OF NATURE REJECT THE NULL HYPOTHESIS RETAIN THE NULL HYPOTHESIS
Null hypothesis is true Type I error Correct decision Null hypothesis is false Correct decision Type II error 15 Inferential statistic
Rejecting the null hypothesis when it is true is called a Type I
error This error means that we have concluded that there is a
relationship in the population when in fact there is not
Type I errors occur because even when there is no
relationship in the population, sampling error alone will
occasionally produce an extreme result
The probability of this error is equal to the alpha level that
the researcher selects (typically .05)
16 Inferential statistic
Retaining the null hypothesis when it is false is called a Type II
error
This error means that we have concluded that there is no relationship in the population when in fact there is a
relationship
In practice, Type II errors occur primarily because the
research design lacks adequate statistical power to detect
the relationship
The term beta (β) refers to the probability of making a Type
II error 17 Inferential statistic
Researchers want to avoid both errors, but there is always the
chance for error in decision
Decreasing Type I error rate, without doing anything else,
will automatically increase type II error rate
Setting it to .01, for example, would mean that if the null
hypothesis is true, then there is only a 1% chance of mistakenly
rejecting it. But making it harder to reject true null hypotheses
also makes it harder to reject false ones and therefore increases
the chance of a Type II error Decreasing Type II error rate, without doing anything else,
increase Type I error rate
it is possible to reduce the chance of a Type II error by setting α
to something greater than .05 (e.g., .10). But making it easier to
reject false null hypotheses also makes it easier to reject true
ones and therefore increases the chance of a Type I error
18 Inferential statistic
The statistical power of a research design is the probability of
correctly rejecting the null hypothesis given the sample size
and expected relationship strength
Statistical power is the complement of the probability of
committing a Type II error
RESEARCHER'S DECISION TRUE STATE OF NATURE REJECT THE NULL HYPOTHESIS RETAIN THE NULL HYPOTHESIS
Null hypothesis is true Type I error Correct decision Null hypothesis is false Correct decision Type II error 19 Inferential statistic
What should you do if you discover that your research design
does not have adequate power? Increase the sample size (power analysis)
Increase the strength of the relationship
Improve your research design (stronger manipulation, more
control over extraneous variables, reduce amount of noise, etc.) 20 Inferential statistic
Another issue related to Type I errors is the so-called file
drawer problem (Rosenthal, 1979)
The idea is that when researchers obtain statistically
significant results, they tend to submit them for publication,
and journal editors and reviewers tend to accept them
But when researchers obtain non-significant results, they tend
not to submit them for publication, or if they do submit them,
journal editors and reviewers tend not to accept them
Researchers end up putting these non-significant results away in
a file drawer
One effect of this tendency is that the published literature
probably contains a higher proportion of Type I errors than
we might expect on the basis of statistical considerations alone
21 Inferential statistic
P-hacking: Researchers who p-hack make various decisions in
the research process to increase their chance of a statistically
significant result (and type I error) by arbitrarily removing
outliers, selectively choosing to report dependent variables,
only presenting significant results, etc. until their results yield
a desirable p-value
Replicability crisis: the inability of researchers to replicate
earlier research findings 22 Inferential statistic
Reproducibility Project 23 Inferential statistic
Although many believe that the failure to replicate research
results is an expected characteristic of cumulative scientific
progress, others have interpreted this situation as evidence of
systematic problems with conventional scholarship in
psychology
Selective definition of outliers
Selective reporting of results
HARKing: hypothesis after the results
P-hacking
Fabrication of data 24 Inferential statistic
Solutions?
Designing and conducting studies that have sufficient
statistical power, in order to increase the reliability of
findings
Publishing both null and significant findings (thereby
counteracting the publication bias and reducing the file
drawer problem)
Describing one’s research designs in sufficient detail to
enable other researchers to replicate your study using an
identical or at least very similar procedure
Conducting high-quality replications and publishing these
results
Open-science practices
25 Inferential statistic
Open-data practices 26 Inferential statistic
Null hypothesis testing is the most common approach to
inferential statistics in psychology
Some criticisms of null hypothesis testing focus on
researchers’ misunderstanding of it
Another set of criticisms focuses on the logic of null hypothesis
testing
This criticism does not have to do with the specific value of
.05 but with the idea that there should be any rigid dividing
line between results that are considered significant and
results that are not
.047 vs. .057?
27 Inferential statistic
Yet another set of criticisms focus on the idea that null
hypothesis testing—even when understood and carried out
correctly—is simply not very informative Recall that the null hypothesis is that there is no
relationship between variables in the population
So to reject the null hypothesis is simply to say that there is
some nonzero relationship in the population
28 Inferential statistic
What to do?
each null hypothesis test should be accompanied by an effect
size – strength of the relationship
use confidence intervals rather than null hypothesis tests
95% confidence intervals also provide null hypothesis testing info Bayesian statistics: an approach in which the researcher specifies the probability that the null hypothesis and any
important alternative hypotheses are true before conducting
the study, conducts the study, and then updates the
probabilities based on the data 29 Inferential statistic
Many studies in psychology focus on the difference between
two means
The most common null hypothesis test for this type of
statistical relationship is the t-test
One-sample t-test Dependent-samples t- test
Independent-samples t- test
30 Inferential statistic
The one-sample t-test is used to compare a sample mean (M)
with a hypothetical population mean (μ0) that provides some
interesting standard of comparison
The null hypothesis is that the mean for the population(μ) is
equal to the hypothetical population mean: μ = μ0
The alternative hypothesis is that the mean for the population
is different from the hypothetical population mean: μ ≠ μ0 To decide between these two hypotheses, we need to find
the probability of obtaining the sample mean (or one more
extreme) if the null hypothesis were true. But finding this p
value requires first computing a test statistic called t 31 Inferential statistic
Test statistic (t)
M is the sample mean
μ0 is the population mean
SD is the sample standard deviation
N is the sample size
32 Inferential statistic
Test statistic (t): known distribution depending on the
degree of freedom 33 Inferential statistic
Test statistic (t): known distribution depending on the
degree of freedom Critical values 34 Inferential statistic
Two-tailed test: where we reject the null hypothesis if the t
score for the sample is extreme in either direction
This test makes sense when we believe that the sample mean
might differ from the hypothetical population mean but we
do not have good reason to expect the difference to go in a
particular direction
One-tailed test: where we reject the null hypothesis only if
the t score for the sample is extreme in one direction that
we specify before collecting the data
This test makes sense when we have good reason to expect
the sample mean will differ from the hypothetical population
mean in a particular direction
35 Inferential statistic
Accuracy of university students’ estimates of the number of
calories in a chocolate chip cookie:
Sample (N=10): 250,280,200,150,175,200,200,220,180,250
Sample M = 212; Sample SD = 39.17
μ0 = 250 36 Inferential statistic
The dependent-samples t-test (sometimes called the
paired-samples t-test) is used to compare two means for
the same sample tested at two different times or under two
different conditions
The null hypothesis is that the means at the two times or
under the two conditions are the same in the population
The alternative hypothesis is that they are not the same. This
test can also be one-tailed if the researcher has good reason
to expect the difference goes in a particular direction 37 Inferential statistic
The first step in the dependent-samples t-test is to reduce
the two scores for each participant to a single difference
score by taking the difference between them
At this point, the dependent-samples t-test becomes a onesample t-test on the difference scores
The hypothetical population mean (μ0) of interest is 0
because this is what the mean difference score would be if
there were no difference on average between the two times
or two conditions
We can now think of the null hypothesis as being that the
mean difference score in the population is 0 (μ0 = 0) and the
alternative hypothesis as being that the mean difference score
in the population is not 0 (μ0 ≠ 0)
38 Inferential statistic
A pretest-posttest study in which 10 participants estimate
the number of calories in a chocolate chip cookie before the
training program and then again afterward
N=10
Pretest: 230, 250, 280, 175, 150, 200, 180, 210, 220, 190
Posttest: 250, 260, 250, 200, 160, 200, 200, 180, 230, 250
Difference: 20, 10, -30, 25, 10, 0, 20, -30, 10 50
Mean difference: 8.50; SD difference: 27.27 39 Inferential statistic
The independent-samples t-test is used to compare the
means of two separate samples (M1 and M2)
The null hypothesis is that the means of the two populations are the same: μ1 = μ2
The alternative hypothesis is that they are not the same: μ1 ≠ μ2
Degree of freedom are equal to N-2 40 Inferential statistic
A health psychologist wants to compare the calorie
estimates of people who regularly eat junk food with the
estimates of people who rarely eat junk food
Junk food eaters: 180, 220, 150, 85, 200, 170, 150, 190
Non-junk food eaters: 200, 240, 190, 175, 200, 300, 240 41 Inferential statistic
T-tests are used to compare two means (a sample mean
with a population mean, the means of two conditions or
two groups)
When there are more than two groups or condition means
to be compared, the most common null hypothesis test is
the analysis of variance (ANOVA)
One-way ANOVA
Factorial ANOVA
Repeated-measures ANOVA 42 Inferential statistic
The one-way ANOVA is used to compare the means of more
than two samples (M1, M2…MG) in a between-subjects
design
The null hypothesis is that all the means are equal in the population: μ1 = μ2 =…= μG
The alternative hypothesis is that not all the means in the
population are equal 43 Inferential statistic
The test statistic for the ANOVA is called F
It is a ratio of two estimates of the population variance
based on the sample data
One estimate of the population variance is called the mean
squares between groups (MSB) and is based on the
differences among the sample means
The other is called the mean squares within groups
(MSW) and is based on the differences among the scores
within each group
The F statistic is the ratio of the MSB to the MSW 44 Inferential statistic
Systematic between-groups variance: Experimental variance (due to independent variables)
Extraneous variance (due to confounding variables)
Non-systematic within-groups variance (error variance)
Non-systematic within-groups variance (error variance) 45 Inferential statistic
Systematic between-groups variance:
Experimental variance (due to independent variables)
Extraneous variance (due to confounding variables)
Non-systematic within-groups variance (error variance)
Non-systematic within-groups variance (error variance) 46 Inferential statistic
Again, the reason that F is useful is that we know how it is
distributed when the null hypothesis is true
Depends on number of groups and sample size
Between-groups df = G-1
Within groups df = N-G 47 Inferential statistic 48 Inferential statistic
A health psychologist wants to compare the calorie
estimates of psychology majors, nutrition majors, and
professional dieticians
Psych majors: 200, 180, 220, 160, 150, 200, 190, 200 (M=187.5, SD=23.14)
Nutrition majors: 190, 220, 200, 230, 160, 150, 200, 210, 195
(M=195, SD=27.77)
Dieticians: 220, 250, 240, 275, 250, 230, 200, 240 (M=238.13,
SD=22.35) 49 Inferential statistic
A health psychologist wants to compare the calorie
estimates of psychology majors, nutrition majors, and
professional dieticians 50 Inferential statistic
When we reject the null hypothesis in a one-way ANOVA,
we conc...

View
Full Document