13 Pages

RogersHowardVesseyPsychBull

Course: RSS 5700, Fall 2009
School: North Texas
Rating:
 
 
 
 
 

Word Count: 9701

Document Preview

METHODS QUANTITATIVE IN PSYCHOLOGY Using Significance Tests to Evaluate Equivalence Between Two Experimental Groups James L. Rogers, Kenneth I. Howard, and John T. Vessey Equivalency testing, a statistical method often used in biostatistics to determine the equivalence of 2 experimental drugs, is introduced to social scientists. Examples of equivalency testing are offered, and the usefulness of the method to the...

Register Now

Unformatted Document Excerpt

Coursehero >> Texas >> North Texas >> RSS 5700

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
METHODS QUANTITATIVE IN PSYCHOLOGY Using Significance Tests to Evaluate Equivalence Between Two Experimental Groups James L. Rogers, Kenneth I. Howard, and John T. Vessey Equivalency testing, a statistical method often used in biostatistics to determine the equivalence of 2 experimental drugs, is introduced to social scientists. Examples of equivalency testing are offered, and the usefulness of the method to the social scientist is discussed. Although the central limit theorem was developed to allow for the estimation of confidence bounds around an observed mean (see Adams, 1974, for a fascinating presentation), its major application in empirical science has been to test whether the absolute difference between two means is greater than zero. However, there has been a growing dissatisfaction with traditional tests of the null hypothesis, in which the difference between two population means is precisely zero. Somehow, the testing of a hypothesis of "no difference" has resulted in the cognitive illusion that the investigator did not actively choose this as a plausible alternative hypothesisthat the null hypothesis was just a given of nature. However, it has been long recognized that with very large sample sizes, this null hypothesis will be rejected in almost all cases, resulting in statistically significant differences that are substantively trivial. In response to this state of affairs, and after establishing the statistical reliability of results, investigators have turned to estimates of the "amount of variance accounted for" or effect sizes (ESs) to evaluate the substantive significance of their findings. With the recent popularity of power analysis, investigators now design their studies in such a way that statistical analyses will be relevant to preselected differences (e.g., small, moderate, or large ESs) of presumed substantive import. This has resulted in a more complex interplay between hypothesis testing and statistical analyses: one in which investigators are asked to select a meaningful difference before executing a study. Dissatisfaction with the traditional null hypothesis has also emerged in an area of research in which the aim is not to establish the superiority of one treatment or method over another, but rather to establish equality between the two methods. This type of research involves the testing of treatment innovations to James L. Rogers, Department of Psychology, Wheaton College; Kenneth I. Howard and John T. Vessey (now at the Office of Population Affairs, U.S. Department of Health and Human Services, Washington, DC), Department of Psychology, Northwestern University. This work was partially supported by Research Grant R 01 MH42901 from the National Institute of Mental Health. We thank Amy Miller for her technical assistance. Correspondence concerning this article should be addressed to James L. Rogers, Department of Psychology, Wheaton College, Wheaton, Illinois 60187. determine if a new method achieves an equally effective outcome as the standard method but perhaps at a lower cost or greater convenience. For example, an investigator may hope to show that group therapy is as effective as individual therapy or that a less expensive antidepressant medication works as well as a more costly one. In these cases, the question is "Is there a more efficient way to achieve the same result?" Common to the increasing interest of social scientists in power analysis, ES specification, and the equality of treatments is the realization that the objective is often to determine whether mean values are "equivalent" rather than "different." Though largely unfamiliar to social scientists, formal statistical tests of equivalence have been evolving over the past 20 years. At present, equivalency tests fall into three general categories: the confidence interval approach, developed by Westlake (1981) and presented in this article; the nonequivalence null hypothesis approach, developed by Anderson and Hauck (1983), which uses an approximation to a noncentral t distribution to calculate the p level of the test; and Bayesian methods, developed by Selwyn, Dempster, and Hall (1981) and Selwyn and Hall (1984). The first two methods are the most attractive because they require the fewest arbitrary decisions (Westlake, 1988). Comparisons of the two methods have revealed that the confidence interval method is conservative (i.e., the actual Type I error rate is equal to or less than the stated Type I error rate) and the Hauck and Anderson method can be liberal (i.e., the actual Type I error rate could be higher than the stated Type I error rate [Anderson and Hauck, 1983]). In this article, the method (Westlake, 1981) typically used by biostatisticians to determine if two drugs have an equivalent impact (Hauck & Anderson, 1986; Makuch & Simon, 1978; Westlake, 1988) is introduced to social scientists. Whereas the purpose of a traditional hypothesis test is to determine whether two groups differ from one another, this procedure is used to determine whether two groups are sufficiently near each other to be considered equivalent. Equivalency testing is appropriate when the investigator is able to specify a small, nonzero difference between two treatments that would serve to define an "equivalence interval" around a difference of zero (e.g., 10%). Any difference small enough to fall within that equivalence interval would be considered clinically and/or practically unimportant. Psychological Bulletin, 1993, Vol. 113, No. 3, 553-565 Copyright 1993 by the American Psychological Association, Inc. 0033-2909/93/S3.00 553 554 J. ROGERS, K. HOWARD, AND J. VESSEY must be rejected, one test evidences a shorter distance between its observed value (M, - M2) and its null value for delta (5, or 62). Choosing the one-sided test having the shorter distance between M, - M2 and delta (either 6, or <52) will yield the smaller test statistic and consequently the larger p value of the two possible tests.1 Because this test has the larger p value, it will be the least likely to show equivalence. However, if the test with the larger p value is rejected, it follows that the remaining one-sided test, which will necessarily evidence a smaller p value, need not be performed as it will always be rejected as well. On the other hand, if the test in question (the largest p value) does not result in the rejection of its null value, it will still be unnecessary to perform the second test because both tests must be rejected to conclude that /a, - /*2 falls within the equivalency interval. In other words, there is never a case when the test of the larger difference between MI - M2 and its delta (<5, or<> 2 ) will need to be conducted.2 Finally, this statement is true even if M, - M2 falls exactly between 6, and 62. Here, the conclusion for one test will be identical to that of the other.3 Both will have the same p value (i.e., both test statistics will have the same absolute value). Thus, only one of the statistical tests needs to be done. It follows from the discussion above that for an equivalency test, the probability of a Type I error is equal to the alpha ([ or a2) selected by the investigator to evaluate the one-sided test evidencing the greatest p value. This testthe one actually performedpredicts perfectly the outcome of the second test. Because results of the two tests are completely dependent, the Type I error rate does not need to be adjusted to account for both tests. Thus, the alpha for an equivalency test is the value given to the one-sided test actually conducted (, or 2), that is, the alpha corresponding to the test evidencing the largest p value.4 The contrast between traditional and equivalence procedures provides another view of Type 1 error in equivalency testing. When conducting a traditional two-tailed test, the experimenter rejects the null hypothesis if either of the two test statistics is significant. 5 Because in a two-tailed test either test statistic being significant by chance would lead to a Type I error, the experimenter must add the Type I error probabilities for each of the two test statistics to calculate the overall probability of making a Type I error. However, in an equivalency test. 1 Let !(M, - M2) - 5,| > |(A/, - M2) - <52|. Then z, = |(A/, - M2) 5il/%,-2 > z2 = \(M, - M2) - 62|AlHMi. Therefore, p(z2) > p(z,). Similarly, if \(M, - A/2) - 5,| < |(M, - A/2) - 62|, then p(z,) > p(z2). 2 If p(z2) > p(z,) and p(z2) leads to rejection of the null, so will p(z,). By the same logic, if/?(z,) > p(z2) and p(z,) leads to rejection of the null, so will z2. 3 Let |(M, - M2) - 8, | = |(M, - M2) - 52|, then z, = z2, so p(z,) = p(z2). Both tests lead to the same conclusion. Thus, only one test need be performed. 4 It is sometimes difficult for those newly exposed to equivalency testing to understand why overall a is not adjusted to account for the fact that two tests are performed. However, it should be clear from the discussion here that no adjustment is necessary. If a Type I error is made, it is because the distribution that M, - M2 comes from is actually to the left of 6, or to the right of 52, but it cannot be both. Therefore, because we do not know on which side the actual distribution lies, the investigator takes a worst-case scenario approach and chooses to examine the larger of the p values from the two tests (p, and p2 in Figure IB). This corresponds to choosing the test that has a higher probability of a Type I error. If this larger p value is less than a, then the investigator will assume (but cannot prove) that a Type I error was not made. Because the two tests constituting the equivalency procedure are conducted simultaneouslythat is, the larger test statistic is conditioned on the smalleronly one decision is made, and that decision is unidirectional. Namely, a single difference is judged to be either acceptably small or unacceptably large. 5 A traditional two-tailed test is in fact two tests, one for each tail. Equivalency testing is straightforward, using concepts highly familiar to social scientists. Both Type I and Type II error rates are controlled. There is a null hypothesis asserting that the difference between two groups is at least as large as the one specified by the investigator, and there is an alternative hypothesis asserting that the difference between two groups is smaller than the specified one. As in traditional hypothesis testing, the goal of the investigator is to reject the null hypothesis and accept the alternative hypothesis. In the discussion that follows, we describe equivalency testing, provide formulas to establish equivalency between either two means or two proportions, and offer a number of illustrative examples. We also examine sample size estimation procedures that allow equivalency testing at a designated level of power. Finally, we discuss the usefulness of equivalency testing in the social sciences. Method Procedure Equivalency testing is accomplished in two steps: The investigator first defines equivalency and then performs two simultaneous onesided hypothesis tests. Defining equivalency. An a priori decision must be made concerning the minimum difference between two groups that would be important enough to make the groups nonequivalent. The investigator will typically consider two means (or proportions) equivalent if they differ by less than some delta in both a negative (5,) and positive (<52) direction. If a greater difference in one direction than the other is allowed, 6, and 62 should be individually defined; otherwise, <52 = -6,. The equivalence definition will depend on the substantive issue under consideration. Equivalence between an experimental group and a control group might be a difference of less than 20% of the control group mean (if the metric is appropriate), a difference of less than 20% of the pooled standard deviation, or a difference less than the minimum value considered to be substantively important. In certain instances, more than one equivalency definition might be specified, each tailored to a particular application or experimental perspective. Test results would then be reported for each definition. Two simultaneous one-sided tests. Two one-sided hypothesis tests must be performed. Figures 1A and IB illustrate the test procedure, whereas Table 1 presents the required formulas to establish equivalency between two means or two proportions. Test 1 seeks to reject a null hypothesis asserting that the difference between two means (or proportions) is less than or equal to the smaller delta (6,). Test 2 seeks to reject a null hypothesis asserting that the difference is greater than or equal to the larger delta (52). Because 5, and <52 are the minimum differences (in a negative and positive direction) that would make a difference, the investigator's goal is to demonstrate statistically that an observed difference between two means, M, - M2, is too large to have come from a distribution with mean of <5, (Test 1) and simultaneously too small to have come from a distribution with mean 62 (Test 2). The logic behind the test is that if M, - M2 is shown to have come from a distribution simultaneously to the right of 5, and to the left of S2, the investigator can conclude that the distribution it came from is somewhere in the middle, with true difference MI ~~ M2 less than the minimum difference of importance that was determined by the investigator. Note that to establish equivalency the investigator must reject both one-sided null hypotheses; however, to do so, only one test is required. Consider the case in which the observed difference, M, - M2, is of unequal distance between .5, and <52. Of the two one-sided tests that EQUIVALENCY TESTING both test statistics must be significant to lead an experimenter to reject the null hypothesis. Because the first test statistic and the second test statistic have to be significant by chance for a Type I error to be made, the experimenter in this case would multiply the Type I error probabilities of each test statistic to calculate the overall probability of making a Type I error. However, as noted above, the probability of the larger (absolute value) of the two test statistics being significant, given that the smaller of the two is significant, equals 1. That is, the overall Type I error probability is the Type 1 error probability of the smaller test statistic multiplied by the conditional Type I error probability of the larger statistic, which will always be 1. Therefore, the overall probability of making a Type I error in an equivalency test is simply the Type I error associated with the smaller of the two test statistics. Confidence intervals. Rather than conducting two one-sided tests, as described above, a confidence interval may be constructed. Equivalence is concluded if the confidence interval is contained within the equivalence interval. (E.g., a symmetrical equivalence interval would be the interval bounded by 6, and 52 = ~^i-) Note that the equivalency confidence interval should be expressed at the 1 - la level of certainty rather than at the customary 1 - a level.6 The 1 - 2a confidence interval will fall within the equivalence interval when both one-sided tests are simultaneously rejected, thereby leading to the rejection of the null with ct probability of a Type I error. Sample size formulas. The sample size per group (nT) required for a means test can be obtained by either Formula 1 or 2 (see below) if delta is symmetrical with regard to direction (&, = -62) and nHa = O.7 Otherwise, the larger of the two sample sizes should be used. (1) = (<52 ~ MJ 555 Example 1: MMPl Similarities Cannon, Bell, Fowler, Penk, and Finkelstein (1990) compared alcoholic and drug-abusive subjects on the Minnesota Multiphasic Personality Inventory (MMPI). We reexamine comparisons between 207 subjects diagnosed as alcohol dependent and 49 subjects diagnosed as drug (but not alcohol) dependent. In addition to significance levels for traditional z values, Table 2 shows the results of an equivalency procedure (two onesided tests) to determine whether the mean MMPI profile scores of drug-addicted subjects were within 10% of the mean MMPI scores of alcoholic subjects. We consider a difference of 10% or less on the MMPI to be clinically trivial. To illustrate the computational aspects of equivalency testing, we arbitrarily selected the Masculinity-Femininity (Mf) scale (eighth row in Table 2). The same procedure applies to all other scales. The information required to conduct equivalency testing, found in Columns 2-5 of Table 2, includes the means, standard deviations, and sample sizes of the two groups being compared. For the Mf scale, these values are Ml = 59.2, SD = 9.5, and n = 207 for the alcohol group and M2 = 61.4, SD = 10.9, and n = 49 for the drug group. Because the equivalence interval is defined as 10% of the alcohol group mean, we calculate that 5, = -10% X 59.2 = -5.92 and 52 = 10% X 59.2 = +5.92, or simply that the equivalency interval is5.92. The obtained difference between the two group means (Ml M2) is -2.2 and, using the formula provided in Table 1, has the following standard error:8 That the 1 - 2a rather than the 1 - a confidence interval should be used is apparent when the rejection regions for the simultaneous onesided tests are considered. We have (M, - M2) - 6,7%,^ > 4 and (M, MI) - &2< -% So 5, - (Af, - A/2)/%,^ < -z, and z, < 62 - (M, A/2)/%,^wb- However, because -4 < 4, the following inequality is apparent: 5, < (M, - M2) - zasMl.Ml < (Mt - M2) + VM.HW, <&2- Tnat is.the ' ~ 2a confidence interval, namely, (M, - M2) 3,^,-^, will always fall within an equivalence interval bounded by 6, and 52 if both one-sided tests are simultaneously rejected. 7 From Figure 1A it is readily seen that -4 = [(M, - M2) - 52 17%,-^ and -%2 = [(M, - M2) - M//<, ITW,^- Here, 3, defines the probability of a Type I error, Zg/2 defines one half the probability of a Type II error (1 power), Af, - M2 is an unknown difference score corresponding to both -z, and %/2, and fiHa is expected MI ~ M2 under the alternative hypothesis. If we assume equal sample sizes (rc, = n2 = n) and substitute for M, - M2, we see that_for Test 2 -4 = [(jfc/rf^, + M) - 2 1 AM,-A. where SM,^ = 5^^2/n. Thus, nT2 = 2sl,M(za + %/2)2/@2 - M//J2 for Test 2, where n^ is the minimal sample size per group required for a given a and 0. Likewise, nTI = 2.s2xx>ledfe + %2)2/(M<, - &,)2 for Test 1 . If delta is symmetrical with regard to direction (<5, = -62) and nHa = 0, then either sample size formula may be used (i.e., nTl = nT2). Otherwise, the larger of the two sample sizes should be used. Nearly the same formulas are applicable when proportions are tested, nn = [p{(\ -p{) + p2(l %/2)2/[(A - P2) - 5, ]2 and = [p,(l - A) + ftd We are grateful to an anonymous reviewer for the following insight. Throughout this article, when delta has been defined as a percentage of the control group mean, the resulting delta has been treated as a constant with no variance. This is common in practice as there is seldom a compelling reason to view delta as, itself, a stochastic value. On the other hand, if one wishes to incorporate into the definition of delta the variability inherent in the control group mean, thereby treating delta 8 6 for Test 2. (2) Nearly the same formulas are applicable when proportions are tested. "TI ~ - Pi) + AO -for Test 1. Finally, in many instances, the investigator will set MI - M2 (r Pi ~ Pi} to zero for the purpose of sample size estimation. On the other hand, if the investigator believes that a difference does in fact exist, but wants to test to determine if the difference is small enough to fall within a defined equivalency interval, the investigator need only set nHa equal to the expected difference. The required sample size is then estimated as described above. Examples The following examples have been selected to provide both computational and substantive illustrations of equivalency testing. No criticism of the analyses provided in the original publications is intended. The studies we have selected and the equivalence intervals we apply make a computational point aimed at disclosing a range of situations that may arise when conducting equivalency tests. Investigators who are experts in their individual areas of research will need to determine when equivalency testing is useful and to define meaningful equivalence intervals relative to the substantive issues at hand, just as they now must define meaningful levels of alpha and power or meaning- ful ESs. 556 (/i, - Ifrf + (n2 - J. ROGERS, K. HOWARD, AND J. VESSEY 1/2 equivalent using the 10% criterion. Note that the confidence limits might just as easily be reexpressed as a percentage of the , + 2 - 2 alcohol group mean by dividing each limit by M, = 59.2. If this were done, the confidence interval, stated as a percentage of the = [[(207 - 1)(9.5)2 + (49 - l)(10.9) 2 ir 1 JL]V/2 + alcohol group mean, would be 8.03% (4.756/59.2) to 0.60% ~ {[ 207 + 4 9 - 2 f 207 49 Jj (0.356/59.2). As expected, this interval (8.03% to 0.60%) falls = 1.554. within 10%, allowing equivalency to be concluded. To graphically display the equivalency results for each of the The traditional z test yields an obtained test statistic value of MMPI scales, 90% and 95% confidence intervals, expressed as -1.416 with an associated p value of .078. That is, z = 2.2/ 1.554 = -1.416, p = .078. Alternatively, one could compute a a percentage of the alcohol group mean, are plotted in Figure 2. The outer tick marks reflect the 95% interval (the traditional 95% confidence interval, M, - M2 (zj(%,_^) = -2.2 (1.96) test) and the inner tick marks reflect the 90% interval (the equi(1.554), or -5.245 to 0.845. Note that zero falls within the intervalency test). If on visual inspection the 90% interval falls val. Having failed to obtain a statistical significance by the within the equivalence band (t 10%), one may conclude equivatraditional test, an investigator might suspect that there is not lence with a 5% risk of Type I error. Also, if on visual inspection an important difference between the two groups, although dithe 95% interval excludes zero (0%), the traditional hypothesis rect evidence is lacking. Indeed, a p value of .078 might be test of no difference may be rejected with a 5% risk of a Type I interpreted as "borderline significance" by some investigators. error. Therefore, using the assumption that a difference of 10% of the It is informative to consider the results found in Figure 2. To alcohol group mean is important, an equivalency test is conassist in the interpretation of Figure 2, the MMPI scales have ducted. been grouped into four categories on the basis of the outcome At this point, the investigator either refers to the larger p of both the traditional test and equivalency test. The alcohol value of the two one-sided tests described in Table 1 or deterand drug group comparisons for Depression (D), Psychopathic mines whether a 90% confidence interval falls into the equivaDeviate (Pd), Paranoia (Pa), Hypomania (Ma), and Social Introlence interval. The two one-sided tests are as follows. version (Si) were statistically different by the traditional test (59.2-61.4)-(-5.92) and failed to obtain statistical equivalence by the two one-sided = 2.394, p = .008, and equivalency procedure. (The 95% confidence intervals do not 1.554 include zero, and the 90% confidence intervals do not lie comi pletely within the preset equivalency interval.) Thus, the data strongly suggest that D, Pd, Pa, Ma, and Si differ in a clinically important fashion between alcoholic and other drug-depenThe larger p value of .008 allows the null hypothesis of nonequident subjects. valence to be rejected; that is, the smaller obtained z value of Because the 95% confidence intervals for Lie, Frequency, Hys2.394 exceeds the critical value (z005 = 1.645). Therefore, an teria, Mf, and Psychasthenia include zero and the 90% confiinvestigator may conclude that the difference between the alcodence intervals lie completely within the preset equivalence hol and drug conditions is within 10% of the alcohol group bounds, we conclude that differences within these five scales mean. In doing so, the investigator runs a Type I error risk of fail to reach statistical significance by the traditional test and .05. Had the rejection region been as extreme as all z values of that the two comparison groups were statistically equivalent. 2.394 or greater, the Type I error risk would have been equal to That is, these five scales exhibit no clinically important differthe p value of .008. ences between alcoholic and other drug-dependent subjects. Clearly, comparing a 90% confidence interval with the equivaThe correction scale was found to be statistically different lence interval results in the same conclusion. In Table 2, we across groups by the traditional test as well as statistically equivsee that the lower confidence limit = (59.2 - 61.4) - (1.645) alent by the equivalency test. Although statistically different (1.554) = -4.756 and the upper confidence limit = (59.2 - 61.4) across groups, this difference is clinically unimportant. + (1.645)(1.554) = 0.356. Thus, the 90% confidence interval Finally, differences between groups on the Hypochondriasis (-4.756 to 0.356) is contained within the equivalency interval and Schizophrenia scales were not statistically significant ei5.92), and we conclude, as before, that the two conditions are ther by the traditional test or by the equivalency test. The variability in these scales was too great to allow an accurate appraisal given the sample size used in this study. itself as a random variable, then the standard errors used for each of the two one-sided tests would be derived as follows: Let a equal the percentage of the control mean used to define delta. Then the difference to be converted to the Test 1 Z score can be expressed as (Mt M2) - (-aM,) = M, + aMi - M2 = (1 + a)M, - M2. The variance of this random variable is derived as follows: Var[(\ + a)M, Af 2 ] = (1 + of VarM, + VarM2 = (1 + a)2(,s2/rt,) + (s\/n2). Substituting the pooled variance for s] and s2,, the standard error used to compute the Test 1 Z score is seen to be {^^^[(1 + a)2/,) + 0/2)1}1/2- Similarly, the standard error used to compute the Test 2 Z score would be {.s2^,^^! a)2/",) + (1/2)]}1/2. Here, s^ = [(*, - l)s2 + (2 - IfrlJAn, + "2 - 2). Example 2: Equivalency in Mela-Analysis Robinson, Berman, and Neimeyer (1990) used meta-analysis to compare the efficacy of different types of therapies in the treatment of depression. Using various techniques, Robinson et al. calculated an average ES that contrasted each of several therapeutic approachescognitive versus behavioral, psychotherapy versus drug therapy, and so onfor studies judged not to suffer from investigator allegiance to any particular treatment. EQUIVALENCY TESTING 557 Of concern here is whether these ESs were less than 0.20, an ES value classified as "small" by Cohen (1977) and considered by us to reflect little, if any, clinical relevance. Table 3 shows the mean ESs in question along with standard errors provided by Robinson et al. (1990). A z statistic, corresponding p value, and appropriate confidence interval (95% for the traditional procedure and 90% for the equivalency procedure) have been provided for each of the 12 comparisons. The traditional p values were found by dividing each mean ES by its corresponding standard error to obtain a z statistic (z = ES/SE), then converting to a p value. The p value tabled for the equivalency test is the larger value found for the two one-sided tests, z, = (ES - 0.20)/SE or Z2 = (ES + 0.20)/SR The confidence intervals were obtained by adding or subtracting from each effect size either 1.96 X SE (traditional) or 1.645 X SE (equivalence). For example, we obtain the following calculations for the cognitive versus behavioral contrast. Traditional z: z = ES/SE = 0.12/0.09 = 1.333, p = .091. Traditional confidence interval: ES (za/2)(SE) = 0.12 (1.96)(0.09), or -0.056 to 0.296. Equivalence z: z, = (ES + 0.20)/SE = (0.12 + 0.20)/0.09 = 3.556, p = .000. z2 = (ES - Q.2Q)/SE = (0.12 - 0.20)/0.09 = -0.889, p= .187. So we table the larger p value of 0.187. Equivalence confidence interval: ES (za)(SE) = 0.12 (1.645)(0.09), or -0.028 to 0.268. The results above indicate that the contrast ES for cognitive versus behavioral therapy (ES = 0.12) is neither statistically different from zero (p = .091, and confidence interval includes zero) nor statistically equivalent (p = . 187, and confidence interval falls outside 0.20). The information in Table 3 is of considerable interest. The analysis shows psychotherapy to be equivalent to the four treatments with which it is compared (drug therapy, combination drug, tricyclics, and combination tricyclics). Variability relative to ES is too large to determine either a difference or an equivalency in efficacy for the remaining eight comparisons: cognitive versus behavioral, cognitive versus cognitive-behavioral, behavioral versus cognitive-behavioral, cognitive versus general verbal, behavioral versus general verbal, cognitive-behavioral versus general verbal, cognitive versus drug therapy, and combination (tricyclics) vs. tricyclics. The results of the traditional test and the equivalency test are displayed in Figure 3. profile comparisons between birth and abortion groups at 1 and 2 years following the pregnancy resolution decision. Because Zabin et al. (1990) did not adjust for baseline differences, the presumption is that the abortion and birth groups are equivalent on important baseline parameters. Table 4 shows the percentage presence of 27 baseline characteristics in women who carried to term or women who aborted, along with the outcomes of both the traditional and the equivalency hypothesis testing procedures. Baseline equivalency was evaluated by determining whether the birth group mean was within 20% of the abortion group mean. In effect, this criterion implies that the baseline parameters should be within 20% of each other before any attempt is made to explain 1- and 2-year differences that might emerge on the basis of the pregnancy resolution decision. Table 4 presents proportions, differences between proportions, standard errors, and the results of the traditional test and the equivalency test. Using the formulas in Table 1, the following calculations may be verified for the baseline variable "ever repeated grade." The same procedure applies to all the baseline variables. Standard error: ii(i - PI) +, ft(i -ft)T /2 -^- ^-\ (.343)(1 - .343) (.505)(1 - .505)1'/2 + 141 93 J Traditional z: z = (Pi - P2)/SE = -.162/0.065 = -2.474, p = 0.007. Traditional confidence interval: (p, - ft) za/2(SE) = -.162 (1.96)(0.065), or -.290 to -.034. Equivalence z: Where 6, = -20% X 0.343 = -0.069, zi = (P, - ft) - = ti/SE (-0.162 + 0.069)/0.065 = -1.427,p = .923. Where 52 = 20% X 0.343 = 0.069, z2 = (Pi - ft) - &2/SE = (-0.162 - 0.069)/0.065 = -3.522, p = .000. Equivalence confidence interval: (p, - p2) za(SE) = -0.162 (1.645)(0.065), or -.270 to -.054. The difference between the proportion of abortion subjects and the proportion of carrying-to-term subjects who had repeated at least one grade (34.3% vs. 50.5%, respectively) is a statistically significant difference that is not small enough to be considered statistically equivalent. That is, the traditional z is greater than zQ025 = 1.96 (the 95% confidence interval does not contain zero), and the smaller equivalence z (with the higher p value) is not larger than z005 = 1.645. This means that the 90% Example 3: Assessing Baseline Equivalence Zabin, Hirsch, and Boscia (1990) compared three groups of inner-city Black adolescent women: those who (a) had negative pregnancy test results, (b) were pregnant and carried to term, and (c) terminated their pregnancy by induced abortion. Zabin, Hirsch, and Emerson (1989) and others (e.g., Adler et al., 1990) have presented various conclusions that involve psychosocial 558 J. ROGERS, K. HOWARD, AND J. VESSEY TEST 1 Assuming Ho: ^ Possible Sampling Distributions (m-m,) .. . TEST 2 Assuming & crlt z crlt Assuming Power F;#M> 1. A: Equivalence testing using two one-sided tests. (In this illustration, c52 = -<5, and nHa = 0. These assumptions are not required [see text]. Ho = null hypothesis; Ha = alternative hypothesis; crit. = critical.) B: Two one-sided tests. (If we assume M, - Af2 = MI - ^2, then the actual power of this test = 1 (18, + /32). zcrit. = critical z-score value). confidence interval does not fall within the equivalence interval of6.9% (i.e., 20% of 34.3). To facilitate the interpretation of the results in Table 4, we classified baseline characteristics as different, equivalent, different and equivalent, or equivocal. Equivocal status was as- signed if neither a statistically significant difference nor a statistically significant equivalence was found. Baseline characteristics were classified as equivalent if the two groups were statistically equivalent and not statistically different. Baseline characteristics were classified as different if the groups were EQUIVALENCY TESTING 559 Table 1 Hypothesis and Test Statistics to Establish Equivalency Between Two Means or Two Proportions Parameter Hypothesis Test statistic Rejection criteria Using significance levels: p(zt) < a and p(z2) & a M2 Ha: M, - , Z 2 = ' Using a critical test statistic (za): |z, | > za and |z 2 | > za Oi ~ Pi) ~ &, \Ha-.p! -p2>St Pi ~P2 Using a confidence interval: 6, < [(M, - M2) r<rsM|_A,2] < 52 Note. -2 If the parameter of interest is MI M2> the i distribution should be used if degrees of freedom are small. 6, and S2 define equivalency, a = probability of a Type I error where for 4 the probability (z > zj < a. Alternatively, Ho (null hypothesis) and Ha (alternative hypothesis) may be expressed as Ho: a Mi - M2 S &i, Or MI - M2 ^: -2: 5, < MI - M2 < 52. found to be different at a statistically significant level but not statistically equivalent. Finally, a classification of different and equivalent was assigned if the groups were determined to be both statistically different and statistically equivalent. Figure 4 shows the results of this classification. About 48% (13/27 = 48.1%) of the baseline variables were found to be different between the abortion group and the birth group, 37% (10/27 = 37.0%) to be equivocal, 11 % (3/27 = 11.1 %) to be equivalent, and 4% (1/27 = 3.7%) to be different and equivalent. In the present example, equivalency tests and traditional tests together provide information suggesting that the comparison groups lack baseline similarity. Note that because a random process cannot be assumed in this quasi-experiment, the p values obtained for both equiva- Table 2 Traditional and Equivalency Test Results for MMPI Scores of Alcohol Versus Drug-Dependent Subjects Aiconoi (n = 207) Scale L F K Hs D M SD 7.1 Drug (n = 49) M SD Traditional Difference M SE Equivalence15 90% CI z Equivalence criterion" 4.93 6.36 4.74 6.62 7.67 6.44 7.04 5.92 5.97 6.75 6.52 6.25 5.89 95% CI z P LCL UCL 2.741 1.208 0.003 7.717 12.626 5.153 -0.840 0.845 -0.036 4.914 0.712 -4.063 6.521 P LCL UCL Mf Pa Pi Sc Ma Si Hy Pd 49.3 63.6 47.4 66.2 76.7 64.4 70.4 59.2 59.7 67.5 65.2 62.5 58.9 8.8 6.4 16.6 14.8 12.2 12.3 9.5 10.5 14.4 16.3 12.0 10.0 48.9 65.2 49.5 63.6 68.7 63.1 75.1 61.4 63.0 67.1 69.7 70.2 55.4 9.1 9.9 8.1 15.7 75.1 13.1 12.8 10.9 10.4 14.9 18.5 10.2 8.3 0.4 -1.6 -2.1 2.6 8.0 1.3 -4.7 -2.2 -3.3 0.4 -4.5 -7.7 3.5 1.195 1.433 1.073 2.611 2.360 1.966 1.969 1.554 1.665 2.303 2.659 1.856 1.541 0.335 -1.117 -1.957 0.996 3.389 0.661 -2.387 -1.416 -1.982 0.174 -1.692 -4.149 2.271 .369 .132 .025t .160 .OOOf .254 .009t .078 .024f .431 .045 .OOOf .012f -1.941 -4.408 -4.203 -2.517 3.374 -2.553 -8.560 -5.245 -6.564 -4.114 -9.712 -11.337 0.479 -3.792 3.322 2.460 -1.540 0.140 -2.614 1.188 2.394 1.603 -2.757 0.760 -0.781 -1.551 .000* .000* .007* .062 .556 .004* .117 .008* .054 .003* .224 .783 .060 -1.565 -3.957 -3.865 -1.695 4.117 -1.934 -7.940 -4.756 -6.039 -3.388 -8.874 -10.753 0.965 2.365 0.757 -0.335 6.895 11.883 4.534 -1.460 0.356 -0.561 4.188 -0.126 -4.647 6.035 Note. Data from Cannon, Bell, Fowler, Penk, and Finkelstein (1990). MMPI = Minnesota Multiphasic Personality Inventory; L = Lie; F = Frequency; K = Correction; Hs = Hypochondriasis; D = Depression; Hy = Hysteria; Pd- Psychopathic Deviate; Mf= Masculinity-Femininity; Pa = Paranoia; Ft = Psychasthenia; Sc = Schizophrenia; Ma = Hypomania; Si = Social Introversion; CI = confidence interval; LCL = lower confidence limit; UCL = upper confidence limit. a Criterion is 10% of the alcohol group mean. b The highest p value of the two one-sided tests has been reported. * p < 0.05 for equivalency, per each one-tailed test, f P < 0.025 for traditional test, two-tailed. 560 J. ROGERS, K. HOWARD, AND J. VESSEY zu 15 10 Confidence Limits 5 Expressed as a Percent Q of the Alcohol Group _g Mean -10 -15 -20 MMPI Scalii: e i u 1 v 1 n t v a 0 D Pd Pa Ma SI L F Hy Mf Pt Not Different K Hs Sc Statistically Different and Not Equivalent and Statistically Equivalent Statistically Not Different Different and and Not Statistically Equivalent Equivalent Figure 2. The 90% and 95% confidence intervals around alcohol group means minus drug group means. (Outer tick marks reflect 95% confidence interval. Inner tick marks reflect 90% confidence interval. MMPI = Minnesota Multiphasic Personality Inventory; D = Depression; Pd= Psychopathic Deviate; Pa = Paranoia; Ma = Hypomania; Si = Social Introversion; L = Lie; F = Frequency; Hy = Hysteria; Mf= Masculinity-Femininity; Pt = Psychasthenia; K = Correction; Hs = Hypochondriasis; Sc = Schizophrenia). lency and traditional tests are, technically, descriptive measures of distance, not probabilities. Although p values in this example provide a useful measure of similarity (or difference) between baseline measures, p values resulting from the equivalency tests do not reflect the probability of a mean difference within an equivalency interval, just as p values resulting from the traditional tests do not reflect the probability of a mean difference greater than zero. A probability interpretation would be appropriate only if one were willing to assume that the underlying sampling distribution for the mean difference has resulted from a random process, an unlikely event in the present example. Discussion In a traditional test, a mean difference of zero is chosen as the null hypothesis to compute the probability of obtaining the test statistic value. If that probability is sufficiently small, the investigator elects to believe that an incorrect assertion was made and that the population means differ by some amount. However, it is never possible to prove, short of complete enumeration of the populations, that the mean difference is not zero or to know what it is if it is not zero. From a purely theoretical stance, a small p value is expected on occasion, therefore, the fact that a small p value exists is of no consequence, one way or the other, to the assertion that the mean difference is or is not zero. As a practical matter, however, if a small p value is obtained, the investigator decides that something "unusual" has occurred and the null hypothesis is rejected. However, it is the concomitant occurrence of a small p value with a known experimental manipulation after random assignment that changes the experimenter's mind about the null hypothesis, not the p value alone. In an equivalency test, even if the two one-sided tests each result in a small probability that their respective test statistic values have occurred by chance under the assumption that the mean difference in reality is as large or larger than the hypothe- EQUIVALENCY TESTING Table 3 Effect Sizes Comparing Various Therapies in the Treatment of Depression Traditional Effect size Scale Cognitive vs. behavioral Cognitive vs. cognitive-behavioral Behavioral vs. cognitive-behavioral Cognitive vs. general verbal Behavioral vs. general verbal Cognitive-behavioral vs. general verbal Psychotherapy vs. drug therapy Psychotherapy vs. combination Combination vs. drug therapy Psychotherapy vs. tricyclics Psychotherapy vs. combination (tri) Combination (tri) vs. tricyclics M 0.12 -0.03 -0.16 -0.15 0.15 0.09 0.07 -0.01 -0.05 0.07 -0.05 -0.05 SE 95% CI 561 Equivalence* 90% CI Z z 1.333 -0.250 -1.600 -0.750 1.154 0.333 1.750 -0.125 -0.238 1.750 -0.625 -0.192 P .091 .401 .055 .227 .124 .369 .040 .450 .406 .040 .266 .424 LCL UCL P .187 .078 .345 .401 .350 .342 .001* .009* .238 .001* .030* .282 LCL UCL 0.09 0.12 0.10 0.20 0.13 0.27 0.04 0.08 0.21 0.04 0.08 0.26 -0.056 -0.265 -0.356 -0.542 -0.105 -0.439 -0.008 -0.167 -0.462 -0.008 -0.207 -0.560 0.296 0.205 0.036 0.242 0.405 0.619 0.148 0.147 0.362 0.148 0.107 0.460 -0.889 1.417 0.400 0.250 -0.385 -0.407 -3.250 2.375 0.714 -3.250 1.875 0.577 -0.028 -0.227 -0.325 -0.479 -0.064 -0.354 0.004 -0.142 -0.395 0.004 -0.182 -0.478 0.268 0.167 0.005 0.179 0.364 0.534 0.136 0.122 0.295 0.136 0.082 0.378 Note. Data adapted from Robinson, Herman, and Neimeyer (1990) with further manipulation by James L. Rogers, Kenneth I. Howard, and John T. Vessey. CI = confidence interval; LCL = lower confidence limit; UCL = upper confidence limit; tri = tricyclics. a The equivalency interval uses 5 = 0.20; the highest p value of the two one-sided test has been reported. * p < 0.05 for equivalency, per each one-tailed test. sized value, the investigator cannot, theoretically, conclude that the true difference is within the equivalence interval. As a practical matter, the investigator will elect to believe that the treatments are equivalent when a small p value occurs in the context of a known experimental manipulation after random assignment. In the same way that a traditional test is used within an experimental context to dispel the belief that a difference of zero exists, so an equivalency test is used within an experimental context to rule out the presence of a difference that would make a difference. If statistical significance has been obtained using a traditional test, the effect size might nevertheless be close enough to zero that, for practical purposes, one decides not to reject the null hypothesis after all but to treat the small difference one believes exists as though it really were zero (i.e., negligible). Again, the use of probability theory in isolation is abandoned. The p value is interpreted in the context of an observed difference between treatment means, that is, the ES. The counterpart to this situation is somewhat changed in equivalency testing because assumed reality under the alternative hypothesis (i.e., the equivalence interval) can arbitrarily be set to be meaningfully large, even when the ESthe distance between an equivalency interval endpoint and the sample mean differenceis small. Said another way, the alternative hypothesis in an equivalency test is that the mean difference falls into a bounded region determined by the investigator, whereas in the traditional test it is not bounded by the investigator but rather constitutes all values except zero. Consequently, practical considerations (as compared with probabilistic considerations) enter at the point of defining a meaningful ES in a traditional test but at the point of defining the equivalence interval in an equivalency test. The traditional test and the equivalency test are not mutually exclusive. If both tests are conducted, it is possible that both will be rejected, that neither will be rejected, or that one will be rejected and the other will not be rejected. It is instructive to consider the following possibilities.9 1. In the event that the equivalency test rejected its null hypothesis, whereas the traditional test failed to reject its null hypothesis, the investigator would conclude that no clinically important difference between the two groups exists. 2. If both null hypotheses were rejected, the investigator would conclude that the treatment difference was larger than the standard null value (usually zero) but smaller than a difference that would make the groups nonequivalent. This outcome has traditionally been addressed through the warning that very large sample sizes ("too much" statistical power) may result in statistical significance even though the ES is clinically trivial (Fleiss, 1981). Equivalency testing provides a more exact method to accomplish this objective. For example, an investigator may be interested in knowing whether a statistically significant but clinically trivial depressive mood shift has occurred. In this case, a change ranging between some nontrivial value and zero might be used as the upper and lower bounds of an equivalence interval in a formal equivalency test. 3. In the event that the traditional test rejected its null hypothesis, whereas the equivalency test failed to reject its null hypothesis, the investigator would conclude that there is a difference between the two groups. 4. It is also possible that both the equivalency and the standard hypothesis tests will fail. In this case, the investigator would conclude that insufficient evidence exists to make any 9 We are not suggesting that the failure to reject the null hypothesis in either the traditional test or the equivalency test in any way changes one's confidence in a significant result in the other test. This would be falling into the same trap of "proving the null hypothesis" that we are trying to help investigators avoid. We are merely presenting the four possible results an investigator could encounter if both the traditional and equivalency tests were performed on the same data. 562 0.8 0.6 0.4 0.2 Effect Size J. ROGERS, K. HOWARD, AND J. VESSEY 1 I I -0.2 \ -0.4 -0.6 -0.8 Comparison: P vs DT P vs CM P vs T P vs CT C vs B C vs CB B vs CB C vs GV B vs GV CB vs GV C vs DT CT vs T Not Different and Statistically Equivalent Not Different and Not Equivalent Figure 3. The 90% and 95% confidence intervals around mean effect sizes for various comparisons between therapies used to treat depression. (Outer tick marks reflect 95% confidence interval. Inner tick marks reflect 90% confidence interval. P = Psychotherapy. DT = Drug therapy. CM = Combination. T = Tricyclics. CT = Combination [tricyclics]. C = Cognitive. B = Behavioral. CB = Cognitive-behavioral. GV = General verbal.) decision. That is, the investigator would surmise that the effect was not reliable enough to conclude either a sizable difference or a reliably small difference. The latter situation might arise in experiments exhibiting insufficient statistical power because of inadequate sample size or excessive noise (within group variation). For example, an equivalency test failing to indicate a reliably small ES might justify a decision to continue an experimental program that to date had exhibited "negative" findings. Whereas the experimenter might ordinarily conduct a power analysis to facilitate this decision, equivalency testing has two advantages. First, the variance of the estimated difference is used in equivalency testing. In power analysis, the estimated difference is treated as though it was without sampling variation. Second, equivalency testing is conducted using an exact or asymptotic sampling distribution. Power analysis relies on the sample variance as an estimate of the true population. Controversial issues that surround the relevance and adequacy of the statistical hypothesis test as a means to scientific discovery apply to equivalency tests as well. Some issues that will no doubt arise as investigators consider the role that equivalency testing might play in their research include the following. 1. Multiple comparisons. If more than one equivalency test is conducted, the question of adjustment to contain experimentwise alpha will arise. In as much as there are various opinions as to when and how this should be done, the same controversies can be expected to carry over to equivalency testing. In general, the investigator who wishes to control experimentwise alpha should make an appropriate adjustment on the basis of the number of tests actually performed if a priori contrasts are designated, or the number of tests implied on a continuum from least to most likely to be equivalent (i.e., distance between means), if the data are inspected post hoc. 2. Multiple dependent variables. Redundant independent measurement should be avoided in research using equivalency tests just as it is avoided in research using traditional tests. When confronted with redundancy in outcome measurements, the investigator should select a priori the most serviceable outcome parameter from each independent outcome dimension or EQUIVALENCY TESTING U D r-Or-Omm^tmooOm m f N O O N ^ f S V j ^"ON^ 1 i-> (N (NO O ' O I I I () <v| 1 o m 1 O v> ^" OO ON rs O <*^ OO - (N I (S so fS ^1" OOOOOOfSm^SOO 1 1 1 1 1 1 1 563 m 1 O m t-- OO OO I ON m (N ON O O r~ oo so SO CS U 1 u JU m t^ *3- rt ^^P^SO'^' OO O ^ \ m ^ so >/"> OO so (N ^ m V"i ~- rs O O rs 1 1 1 1 ^f ^ fN I <N /1 **t I OO (N O I O t*~ vi O i-J B O O (*- (N m I t^ so O ONin m O I I I ON ON (N O so "i (N O O O O O I I ^~ ^ O 1 ~3 S UJ Q. isi^iialip;? ls;s! i S i i i i i i s I ? l i 5 3 m & t^ fN S <...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

North Texas - RSS - 5030
Psychological Bulletin 1985, Vol 97. No. 1. 129-133Copyright 1985 by the Am1 Psychological Association, Inc. 0033-2909/85/W0.75A Variance Explanation Paradox: When a Little is a LotRobert P. Abelson Yale UniversityConcerning a single major le
North Texas - RSS - 5030
North Texas - RSS - 5030
On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistics Yoav Benjamini; Yosef Hochberg Journal of Educational and Behavioral Statistics, Vol. 25, No. 1. (Spring, 2000), pp. 60-83.Stable URL: http:/links.js
North Texas - RSS - 5030
27British Journal of Mathematical and Statistical Psychology (2002), 55, 2739 2002 The British Psychological Society www.bps.org.ukControlling the rate of Type I error over a large set of statistical testsH.J. Keselman1*, Robert Cribbie1 and Bu
North Texas - RSS - 5030
A Bluffer's Guide to . SphericityAndy Field University of SussexThe use of repeated measures, where the same subjects are tested under a number of conditions, has numerous practical and statistical benefits. For one thing it reduces the error var
North Texas - RSS - 5030
Psychological Bulletin 1988, Vol. 104, No. 3, .196-404Copyright 1988 by the American Psychological Association, Inc. 0033-2909/88/$00.75Heterogeneity of Variance in Experimental Studies: A Challenge to Conventional InterpretationsAnthony S. Bryk
North Texas - RSS - 6810
Examining Multivariate NormalityTo test for multivariate normality:Help file at: http:/rss.acs.unt.edu/Rdoc/library/mvnormtest/html/mshapiro.test.htmlFor example.10 columns of 1000 rows of random normal deviates: library(mvnormtest) mshapiro.tes
North Texas - RSS - 6810
Multivariate Behavioral Research, 1987,22,267-305A Brief History of the Philosophical Foundations of Exploratory Factor AnalysisStanley A. MulaikGeorgia Institute of TechnologyExploratory factor analysis derives its key ideas from Inany sources
North Texas - RSS - 6810
Methods in PsychiatryFinding Our Way: An Introduction to Path AnalysisDavid L Streiner, PhD'Path analysis is an extension of multiple regression. It goes beyond regression in that it allows for the analysis of more complicated models. In particul
North Texas - RSS - 5700
North Texas - RSS - 6810
Resampling methods: concepts, applications, and justification. Yu, .http:/pareonline.net/getvn.asp?v=8&amp;n=19Yu, Chong Ho (2003). Resampling methods: concepts, applications, and justification. Practical Assessment, Research &amp; Evaluation, 8(19). Ret
North Texas - RSS - 5700
THESCIENCEOF PSYCHOLOGYWhat are the aims of science and what place has psychology and statistics within it?Outline Howdowecometoknowanything? DefiningScience Characteristics ObjectivesandTechniques PhilosophicalIssues:Howdoesscienceproceed?
North Texas - RSS - 5700
MORE ON THE SCIENCE OF PSYCHOLOGYMore thoughts from MikeThe public's perspective Early on (and I mean way back) science made bold claims and spoke in absolute terms However, the only thing that was consistently discovered was that there w
North Texas - RSS - 5700
Additional ThoughtsCompeting ideas != competing truths The theories regard the truth but are not the truth itself Truths, by definition cannot compete with one another. You cannot have A and not A both be true. Taking context into consideration
North Texas - RSS - 5700
MORE ON THE SCIENCE OF PSYCHOLOGYDevelopment and Testing of Research IdeasOutlineDoing research The Role of Ideas Development Theories and Hypotheses Theories vs. Facts Clear Ideas Previews: Hypothesis Testing, Power, Effect Size, Repli
North Texas - RSS - 5700
Measurement ErrorWhatever measurement we might make with regard to somepsychological construct, we do so with some amount of error Any observed score for an individual is their true score with error addedinThere are different types of &quot; error
North Texas - RSS - 5700
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; &lt;Error&gt;&lt;Code&gt;InternalError&lt;/Code&gt;&lt;Message&gt;We encountered an internal error. Please try again.&lt;/Message&gt;&lt;RequestId&gt;A450DC317D33764E&lt;/RequestId&gt;&lt;HostId&gt;kfvr8Tqt4mDPrcGSnKZz yRrIAJZUYpt9RWA9ucwQK8G/hlm3YVYuvJJcfv3+
North Texas - RSS - 5700
Initial Data AnalysisDISTINCTIONSSome DistinctionsPopulation vs. Sample Descriptive vs. Inferential stats Variables Types of data Quantitative versus Categorical Measurement scalesPopulationThe entire collection of events that you are in
North Texas - RSS - 5700
Initial Data AnalysisBeginning the Visualization of DataPlotting DataOften, the first thing one does with data is to plot frequency distributions. Usually this is done by first creating a table of the frequencies broken down by values of the re
North Texas - RSS - 5700
Initial Data AnalysisCentral TendencyOutline What is `central tendency'? Classic measuresMean, Median, ModeWhat's an `average'? Properties of statistics Sufficiency Efficiency Bias ResistanceResistant measuresMeasures of Centr
North Texas - RSS - 5700
Central TendencyMechanicsNotation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase letter such as X or Y. When we want to talk about specific data points within that set, w
North Texas - RSS - 5700
Measuresofvariability:understandingthe complexityofnaturalphenomena Inadditiontoknowingwherethecenterofthedistributionis,itis oftenhelpfultoknowthedegreetowhichindividualvaluescluster aroundthecenter(orperhapsdont) Thisisknownasvariability,
North Texas - RSS - 5700
Variability MechanicsThe Average Deviation Another approach to estimating variance is to directly measure the degree to which individual data points differ from the mean and then average those deviations. That is:(X - X ) NThe Average Deviati
North Texas - RSS - 5700
THENORMAL DISTRIBUTIONOBJECTIVESReviewtheNormalDistribution PropertiesoftheStandardNormalDistribution ReviewtheCentralLimitTheorem UseNormalDistributioninaninferential fashionTHEORETICALDISTRIBUTIONEmpiricaldistributionsbasedondata Exam
North Texas - RSS - 5700
Normal DistributionPractice with z-scoresProbabilities are depicted by areas under the curve Total area under the curve is 1 Only have a probability from width For an infinite number of z scores each point has a probability of 0 (for the single
North Texas - RSS - 5700
Thinking About ProbabilityOutline Basic Idea Different types of probability Definitions and Rules Conditional and Joint probabilities Essentials of understanding stats Discrete and Continuous probability distributions Density Permutations
North Texas - RSS - 5700
Probability MechanicsLaws of probability: Addition The question of Or p(A or B) = p(A) + p(B) Probability of getting a grape or lemon skittle in a bag of 60 pieces where there are 15 strawberry, 13 grape, 12 orange, 8 lemon, 12 lime? p(G) = 13/
North Texas - RSS - 5700
The Sampling DistributionIntroduction to Hypothesis Testing and Interval Estimation OutlineDistinctions Sampling Distribution The Central Limit Theorem Confidence Intervals Random Sampling Key things to keep in mindPopul
North Texas - RSS - 5700
Sampling distributionDo not `read' this. It is meant to be watched only.POPULATION Any and usually undefinable N , Sample Size = NX, sStart with just a single random sample from the population.POPULATION Any and usually undefinable N , S
North Texas - RSS - 5700
Null Hypothesis Signficance TestingConsider the general approach and associated problemsSome thoughts &quot;Statistical significance testing retards the growth of scientific knowledge; it never makes a positive contribution&quot; (Schmidt &amp; Hunter, 1997
North Texas - RSS - 5700
Getting Started with Hypothesis TestingThe Single SampleOutline Remembering the binomial situation and z-score basics Hypothesis testing with the normal distribution When is unknown the t distribution One vs. Two-tails ProblemsRecall the