217.pdf - Instructor Dr Andrea Albonico Inferential...

This preview shows page 1 out of 518 pages.

Unformatted text preview: Instructor: Dr. Andrea Albonico Inferential statistic — Outline: — Null hypothesis — Sample size and relationship strength — Statistical significance — Null-hypothesis testing — Analysis of variance — Testing correlation coefficient — Errors in null hypothesis testing 1 Inferential statistic — Recall that Matthias Mehl and his colleagues, in their study of sex differences in talkativeness, found that the women in their sample spoke a mean of 16,215 words per day and the men a mean of 15,669 words per day (Mehl, Vazire, Ramirez-Esparza, Slatcher, & Pennebaker, 2007) — But despite this sex difference in their sample, they concluded that there was no evidence of a sex difference in talkativeness in the population — How comes? 2 Inferential statistic — Psychological research typically involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables — These descriptive data for the sample are called statistics — In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from — Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population — These corresponding values in the population are called parameters 3 Inferential statistic — Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters — This is because there is a certain amount of random variability in any statistic from sample to sample — This random variability in a statistic from sample to sample is called sampling error — One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population — A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population — But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error 4 Inferential statistic — In fact, any statistical relationship in a sample can be interpreted in two ways: — There is a relationship in the population, and the relationship in the sample reflects this — There is no relationship in the population, and the relationship in the sample reflects only sampling error — The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations 5 Inferential statistic — Null hypothesis testing (often called null hypothesis significance testing or NHST) is a formal approach to deciding between two interpretations of a statistical relationship in a sample — One interpretation is called the null hypothesis (often symbolized H0): This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error — The other interpretation is called the alternative hypothesis (often symbolized as H1): This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population 6 Inferential statistic — The steps are as follows: — Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population — Determine how likely the sample relationship would be if the null hypothesis were true — If the sample relationship would be extremely unlikely, then reject the null hypothesis in favour of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis 7 Inferential statistic — A crucial step in null hypothesis testing is finding the probability of the sample result or a more extreme result if the null hypothesis were true — P-value — A low p-value means that the sample or more extreme result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. — A high p-value that is not low means that the sample or more extreme result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis 8 Inferential statistic — In null hypothesis testing, the criterion is called α (alpha) and is almost always set to .05 — If there is a 5% chance or less of a result at least as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected — When this happens, the result is said to be statistically significant — If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained — This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to reject it 9 Inferential statistic 10 Inferential statistic — What is the p-value? — It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample — The stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true 11 Inferential statistic How relationship strength and ample sir combine to determine whether a result is statistically significant Relationship strength Sample Size weak medium strong Small (N=20) no no maybe Medium (N=50) no no yes Large (N=100) maybe yes yes Extra large (N=500) yes yes yes 12 Inferential statistic — A statistically significant result is not necessarily a strong one — Even a very weak result can be statistically significant if it is based on a large enough sample — This is why it is important to distinguish between the statistical significance of a result and the practical significance of that result — Practical significance refers to the importance or usefulness of the result in some real-world context 13 Inferential statistic — In null hypothesis testing, the researcher tries to draw a reasonable conclusion about the population based on the sample RESEARCHER'S DECISION TRUE STATE OF NATURE REJECT THE NULL HYPOTHESIS RETAIN THE NULL HYPOTHESIS Null hypothesis is true Type I error Correct decision Null hypothesis is false Correct decision Type II error 14 Inferential statistic — In null hypothesis testing, the researcher tries to draw a reasonable conclusion about the population based on the sample RESEARCHER'S DECISION TRUE STATE OF NATURE REJECT THE NULL HYPOTHESIS RETAIN THE NULL HYPOTHESIS Null hypothesis is true Type I error Correct decision Null hypothesis is false Correct decision Type II error 15 Inferential statistic — Rejecting the null hypothesis when it is true is called a Type I error This error means that we have concluded that there is a relationship in the population when in fact there is not — Type I errors occur because even when there is no relationship in the population, sampling error alone will occasionally produce an extreme result — The probability of this error is equal to the alpha level that the researcher selects (typically .05) — 16 Inferential statistic — Retaining the null hypothesis when it is false is called a Type II error — This error means that we have concluded that there is no relationship in the population when in fact there is a relationship — In practice, Type II errors occur primarily because the research design lacks adequate statistical power to detect the relationship — The term beta (β) refers to the probability of making a Type II error 17 Inferential statistic — Researchers want to avoid both errors, but there is always the chance for error in decision — Decreasing Type I error rate, without doing anything else, will automatically increase type II error rate — — Setting it to .01, for example, would mean that if the null hypothesis is true, then there is only a 1% chance of mistakenly rejecting it. But making it harder to reject true null hypotheses also makes it harder to reject false ones and therefore increases the chance of a Type II error Decreasing Type II error rate, without doing anything else, increase Type I error rate — it is possible to reduce the chance of a Type II error by setting α to something greater than .05 (e.g., .10). But making it easier to reject false null hypotheses also makes it easier to reject true ones and therefore increases the chance of a Type I error 18 Inferential statistic — The statistical power of a research design is the probability of correctly rejecting the null hypothesis given the sample size and expected relationship strength — Statistical power is the complement of the probability of committing a Type II error RESEARCHER'S DECISION TRUE STATE OF NATURE REJECT THE NULL HYPOTHESIS RETAIN THE NULL HYPOTHESIS Null hypothesis is true Type I error Correct decision Null hypothesis is false Correct decision Type II error 19 Inferential statistic — What should you do if you discover that your research design does not have adequate power? Increase the sample size (power analysis) — Increase the strength of the relationship — — Improve your research design (stronger manipulation, more control over extraneous variables, reduce amount of noise, etc.) 20 Inferential statistic — Another issue related to Type I errors is the so-called file drawer problem (Rosenthal, 1979) — The idea is that when researchers obtain statistically significant results, they tend to submit them for publication, and journal editors and reviewers tend to accept them — But when researchers obtain non-significant results, they tend not to submit them for publication, or if they do submit them, journal editors and reviewers tend not to accept them — Researchers end up putting these non-significant results away in a file drawer — One effect of this tendency is that the published literature probably contains a higher proportion of Type I errors than we might expect on the basis of statistical considerations alone 21 Inferential statistic — P-hacking: Researchers who p-hack make various decisions in the research process to increase their chance of a statistically significant result (and type I error) by arbitrarily removing outliers, selectively choosing to report dependent variables, only presenting significant results, etc. until their results yield a desirable p-value — Replicability crisis: the inability of researchers to replicate earlier research findings 22 Inferential statistic — Reproducibility Project 23 Inferential statistic — Although many believe that the failure to replicate research results is an expected characteristic of cumulative scientific progress, others have interpreted this situation as evidence of systematic problems with conventional scholarship in psychology — Selective definition of outliers — Selective reporting of results — HARKing: hypothesis after the results — P-hacking — Fabrication of data 24 Inferential statistic — Solutions? — — — — — Designing and conducting studies that have sufficient statistical power, in order to increase the reliability of findings Publishing both null and significant findings (thereby counteracting the publication bias and reducing the file drawer problem) Describing one’s research designs in sufficient detail to enable other researchers to replicate your study using an identical or at least very similar procedure Conducting high-quality replications and publishing these results Open-science practices 25 Inferential statistic — Open-data practices 26 Inferential statistic — Null hypothesis testing is the most common approach to inferential statistics in psychology — Some criticisms of null hypothesis testing focus on researchers’ misunderstanding of it — Another set of criticisms focuses on the logic of null hypothesis testing — This criticism does not have to do with the specific value of .05 but with the idea that there should be any rigid dividing line between results that are considered significant and results that are not — .047 vs. .057? 27 Inferential statistic — Yet another set of criticisms focus on the idea that null hypothesis testing—even when understood and carried out correctly—is simply not very informative Recall that the null hypothesis is that there is no relationship between variables in the population — So to reject the null hypothesis is simply to say that there is some nonzero relationship in the population — 28 Inferential statistic — What to do? — each null hypothesis test should be accompanied by an effect size – strength of the relationship — use confidence intervals rather than null hypothesis tests — 95% confidence intervals also provide null hypothesis testing info — Bayesian statistics: an approach in which the researcher specifies the probability that the null hypothesis and any important alternative hypotheses are true before conducting the study, conducts the study, and then updates the probabilities based on the data 29 Inferential statistic — Many studies in psychology focus on the difference between two means — The most common null hypothesis test for this type of statistical relationship is the t-test — One-sample t-test Dependent-samples t- test — Independent-samples t- test — 30 Inferential statistic — The one-sample t-test is used to compare a sample mean (M) with a hypothetical population mean (μ0) that provides some interesting standard of comparison — The null hypothesis is that the mean for the population(μ) is equal to the hypothetical population mean: μ = μ0 — The alternative hypothesis is that the mean for the population is different from the hypothetical population mean: μ ≠ μ0 — To decide between these two hypotheses, we need to find the probability of obtaining the sample mean (or one more extreme) if the null hypothesis were true. But finding this p value requires first computing a test statistic called t 31 Inferential statistic — Test statistic (t) — — — — M is the sample mean μ0 is the population mean SD is the sample standard deviation N is the sample size 32 Inferential statistic — Test statistic (t): known distribution depending on the degree of freedom 33 Inferential statistic — Test statistic (t): known distribution depending on the degree of freedom — Critical values 34 Inferential statistic — Two-tailed test: where we reject the null hypothesis if the t score for the sample is extreme in either direction — This test makes sense when we believe that the sample mean might differ from the hypothetical population mean but we do not have good reason to expect the difference to go in a particular direction — One-tailed test: where we reject the null hypothesis only if the t score for the sample is extreme in one direction that we specify before collecting the data — This test makes sense when we have good reason to expect the sample mean will differ from the hypothetical population mean in a particular direction 35 Inferential statistic — Accuracy of university students’ estimates of the number of calories in a chocolate chip cookie: — Sample (N=10): 250,280,200,150,175,200,200,220,180,250 — Sample M = 212; Sample SD = 39.17 — μ0 = 250 36 Inferential statistic — The dependent-samples t-test (sometimes called the paired-samples t-test) is used to compare two means for the same sample tested at two different times or under two different conditions — The null hypothesis is that the means at the two times or under the two conditions are the same in the population — The alternative hypothesis is that they are not the same. This test can also be one-tailed if the researcher has good reason to expect the difference goes in a particular direction 37 Inferential statistic — The first step in the dependent-samples t-test is to reduce the two scores for each participant to a single difference score by taking the difference between them — At this point, the dependent-samples t-test becomes a onesample t-test on the difference scores — The hypothetical population mean (μ0) of interest is 0 because this is what the mean difference score would be if there were no difference on average between the two times or two conditions — We can now think of the null hypothesis as being that the mean difference score in the population is 0 (μ0 = 0) and the alternative hypothesis as being that the mean difference score in the population is not 0 (μ0 ≠ 0) 38 Inferential statistic — A pretest-posttest study in which 10 participants estimate the number of calories in a chocolate chip cookie before the training program and then again afterward — N=10 — Pretest: 230, 250, 280, 175, 150, 200, 180, 210, 220, 190 — Posttest: 250, 260, 250, 200, 160, 200, 200, 180, 230, 250 — Difference: 20, 10, -30, 25, 10, 0, 20, -30, 10 50 — Mean difference: 8.50; SD difference: 27.27 39 Inferential statistic — The independent-samples t-test is used to compare the means of two separate samples (M1 and M2) — The null hypothesis is that the means of the two populations are the same: μ1 = μ2 — The alternative hypothesis is that they are not the same: μ1 ≠ μ2 — Degree of freedom are equal to N-2 40 Inferential statistic — A health psychologist wants to compare the calorie estimates of people who regularly eat junk food with the estimates of people who rarely eat junk food — Junk food eaters: 180, 220, 150, 85, 200, 170, 150, 190 — Non-junk food eaters: 200, 240, 190, 175, 200, 300, 240 41 Inferential statistic — T-tests are used to compare two means (a sample mean with a population mean, the means of two conditions or two groups) — When there are more than two groups or condition means to be compared, the most common null hypothesis test is the analysis of variance (ANOVA) — One-way ANOVA — Factorial ANOVA — Repeated-measures ANOVA 42 Inferential statistic — The one-way ANOVA is used to compare the means of more than two samples (M1, M2…MG) in a between-subjects design — The null hypothesis is that all the means are equal in the population: μ1 = μ2 =…= μG — The alternative hypothesis is that not all the means in the population are equal 43 Inferential statistic — The test statistic for the ANOVA is called F — It is a ratio of two estimates of the population variance based on the sample data — One estimate of the population variance is called the mean squares between groups (MSB) and is based on the differences among the sample means — The other is called the mean squares within groups (MSW) and is based on the differences among the scores within each group — The F statistic is the ratio of the MSB to the MSW 44 Inferential statistic — Systematic between-groups variance: Experimental variance (due to independent variables) — Extraneous variance (due to confounding variables) — Non-systematic within-groups variance (error variance) — — Non-systematic within-groups variance (error variance) 45 Inferential statistic — Systematic between-groups variance: — Experimental variance (due to independent variables) — Extraneous variance (due to confounding variables) — Non-systematic within-groups variance (error variance) — Non-systematic within-groups variance (error variance) 46 Inferential statistic — Again, the reason that F is useful is that we know how it is distributed when the null hypothesis is true — Depends on number of groups and sample size — Between-groups df = G-1 — Within groups df = N-G 47 Inferential statistic 48 Inferential statistic — A health psychologist wants to compare the calorie estimates of psychology majors, nutrition majors, and professional dieticians — Psych majors: 200, 180, 220, 160, 150, 200, 190, 200 (M=187.5, SD=23.14) — Nutrition majors: 190, 220, 200, 230, 160, 150, 200, 210, 195 (M=195, SD=27.77) — Dieticians: 220, 250, 240, 275, 250, 230, 200, 240 (M=238.13, SD=22.35) 49 Inferential statistic — A health psychologist wants to compare the calorie estimates of psychology majors, nutrition majors, and professional dieticians 50 Inferential statistic — When we reject the null hypothesis in a one-way ANOVA, we conc...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture