Hitchcock, Kratochwill, & Chasen (2015)(1).pdf - J Behav...

This preview shows page 1 out of 11 pages.

Unformatted text preview: J Behav Educ (2015) 24:459–469 DOI 10.1007/s10864-015-9224-1 COMMENTARY What Works Clearinghouse Standards and Generalization of Single-Case Design Evidence John H. Hitchcock • Thomas R. Kratochwill Laura C. Chezan • Published online: 14 March 2015  Springer Science+Business Media New York 2015 Abstract A recent review of existing rubrics designed to help researchers evaluate the internal and external validity of single-case design (SCD) studies found that the various options yield consistent results when examining causal arguments. The authors of the review, however, noted considerable differences across the rubrics when addressing the generalization of findings. One critical finding is that the What Works Clearinghouse (WWC) review process does not capture details needed for report readers to evaluate generalization. This conclusion is reasonable if considering only the WWC’s SCD design standards. It is important to note that these standards are not used in isolation, and thus generalization details cannot be fully understood without also considering the review protocols and a tool called the WWC SCD review guide. Our purpose in this commentary is to clarify how the WWC review procedures gather information on generalization criteria and to describe a threshold for judging how much evidence is available. It is important to Some of the information contained herein is based on the What Works Clearinghouse’s Single-case design technical documentation version 1.0 (Pilot) (referred to as the Standards in this article) produced by two of the current authors (Kratochwill and Hitchcock) and the Panel members and available at http:// ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf. The Standards that are described in the technical documentation were developed by a Panel of authors for the Institute of Education Sciences (IES) under Contract ED07-CO-0062 with Mathematica Policy Research, Inc. to operate the What Works Clearinghouse (WWC). The content of this article does not necessarily represent the views of the Institute of Education Sciences or the WWC. J. H. Hitchcock (&) Center for Evaluation and Education Policy, Indiana University, 1900 East Tenth Street, Bloomington, IN 47406-7512, USA e-mail: [email protected] T. R. Kratochwill University of Wisconsin-Madison, Madison, WI, USA L. C. Chezan Old Dominion University, Norfolk, VA, USA 123 460 J Behav Educ (2015) 24:459–469 clarify how the system works so that the SCD research community understands the standards, which in turn might facilitate use of future WWC reports and possibly influence both the conduct and the reporting of SCD studies. Keywords Single-case design  Generalization  Internal validity  External validity There is a long-standing call for using interventions with a strong evidence base (e.g., Deegear and Lawson 2003; Kratochwill 2002; Kratochwill and Stoiber 2000; Schneider et al. 2007). One set of methodologies that has been recognized as a viable approach for generating empirical evidence to inform treatment innovation, adoption, or improvement is the single-case design (SCD) approach (e.g., Horner et al. 2005; Kazdin 2011; Kratochwill and Levin 2014). SCDs are experimental methods consisting of various designs involving repeated measures of a specific behavior or skill under different conditions to evaluate the effectiveness of a treatment for an individual or a small group of individuals that serve as their own control (Kazdin 2011). SCDs have emerged from the field of psychology and have been used across various disciplines including education, medicine, and speech and language therapy. Like most investigations, one SCD study is unlikely to generate sufficient empirical evidence to warrant policy change, even if it might compel alteration to localized practice. Thus, it is important to not only evaluate the results of a single SCD but also synthesize evidence from multiple SCD studies examining the effectiveness of a treatment and then make inferences about generalizing findings to a population of interest, as well as potentially to other populations and settings. Collating evidence from multiple SCD studies conducted by different research teams, with different participants, and across different settings has the potential to provide stronger evidence that might inform treatment decisions and policy change. A key part of any effort to collate empirical evidence is to generate rubrics that can be used to judge the findings of individual studies and subsequently summarize information in the form of systematic reviews. Recently, researchers have published a number of rubrics, or guidelines, for judging SCD evidence (e.g., Kratochwill et al. 2010, 2013; Smith 2012; Wendt and Miller 2012) and conducted SCD systematic reviews (e.g., Bowman-Perrott et al. 2013; Dart et al. 2014). Maggin et al. (2013) have made an important contribution to such efforts by comparing seven different rubrics designed to assist scientists, practitioners, or legislators in evaluating findings of SCD studies. Maggin et al. examined the consistency of existing rubrics designed to assess different requirements of SCD methodology related to internal and external validity. They first reviewed each rubric and then applied each one to a set of SCD studies focusing on self-management interventions. Their effort yielded a number of key findings, one of which was the consistency of internal validity judgments made about component SCD studies across the seven 123 J Behav Educ (2015) 24:459–469 461 rubrics.1 A second key finding was that there was limited agreement across the rubrics pertaining to issues of generalizing evidence. According to the Maggin et al., some rubrics were designed to capture considerable detail about generalization, whereas other rubrics functionally ignored this consideration. For example, Maggin et al. state: ‘‘…the WWC criteria provided guidance solely for criteria related to establishing experimental control while others included several descriptive criteria related to establishing the generality of the intervention’’ (p. 20). This particular conclusion about the WWC represents the motivation for this commentary. The WWC criteria and review procedures do in fact deal with documenting information that informs generalization and there is value in explaining to stakeholders how these procedures are applied when conducting a review. But at the outset, it is important to note that we understand that several factors may contribute to the confusion about the WWC Pilot Standards (i.e., the Standards) and, in particular, how generalization of findings is handled. One contributing factor to the confusion about the Standards and generalization relates to the fact that Maggin et al. (2013) may have reviewed the Standards document in isolation of review protocols and the SCD Review Guide. The WWC review protocols are, however, important because they specify the research questions to be addressed via a review; they also describe the population(s) of interest, the relevant outcome domains, and the settings in which interventions should be applied. The above-mentioned aspects are critical because part of understanding generalization is thinking through the populations, settings, and contexts to which one might want to generalize information. The SCD Review Guide represents a database where all the relevant aspects of a study are first documented in a systematic manner and then evaluated to draw a conclusion about the evidence presented in a study. Another contributing factor to the confusion about Standards and generalization relates to the fact that WWC reporting on SCD evidence has been, to date, minimal. Therefore, limited information on the application of Standards to identifying empirical evidence across multiple SCD studies is publicly available, which perhaps lead some to assume that the Standards do not address this important issue. Thus, our purpose in this commentary is to clarify the WWC procedures with particular focus on how the Standards are implemented to address the generalization of findings from SCD studies reviewed within the context of a specific protocol. We believe such clarification is worthwhile because an understanding of these procedures among the SCD research community will facilitate the use of future WWC project reports and may influence both the conduct and the reporting of studies that use these types of designs. We begin first with a description of the external and internal validity of SCDs. We then present different approaches to 1 Shadish et al. (2002) argue that internal validity, or the degree to which a causal relationship exists between a treatment and outcome variable is valid, is the sin qua non of experimental design. In other words, there might not be much point in carefully pondering the external validity (which is related to generalization) of studies that do not yield strong evidence of a causal effect. This position is because, if one cannot demonstrate that a given treatment was responsible for some outcome, then there is little point in examining whether the evidence generalizes to different contexts. As applied to SCDs, if one has no or limited confidence that there is a functional relationship between a treatment (independent variable) and dependent variable, then why do the hard work of generalizing? 123 462 J Behav Educ (2015) 24:459–469 evaluate the generalization of findings from SCD studies (detailed discussion around evaluating experimental control is available in Kratochwill et al. 2010, 2013). Next, we discuss the Standards within the context of a WWC review protocol focusing on criteria used to address generalization including the 5-3-20 rule. We end by highlighting the importance of ongoing refinement of the Standards to better capture methodological criteria of SCD with the ultimate goal of informing policy and practice. Internal and External Validity As noted above, it appears from the Maggin et al. (2013) comparison that there was reasonable consistency across the rubrics pertaining to judgments of internal validity. This finding is not surprising. Through work in Standards development in the WWC and other ventures, such as the Task Force for Evidenced-Based Interventions in School Psychology (see Kratochwill and Stoiber 2002), it became clear that the Campbellian validity framework (Shadish 1995; Shadish et al. 2002) applies to a broad number of designs that are capable of yielding causal evidence. This evidence is generated from SCDs when they are used to evaluate treatment effects. The task of judging internal validity first entails specifying the causal questions at hand and then selecting the design that allows one to control for a common number of threats to internal validity, which in essence, represent alternative explanations for any observed changes to a dependent variable after treatment exposure. Examples of such threats are maturation, history, regression to the mean, diffusion of treatment, and instrumentation (see Shadish et al. 2002 for details). SCDs can be designed in such a way to render these alternative explanations as implausible. Identifying the presence of these design features will yield judgments about whether there is strong evidence that a treatment worked as intended (cf. Horner et al. 2005; Kratochwill et al. 2010, 2013). Thus, given the logic behind causal inference, we might expect that the sundry rubrics yield fairly consistent conclusions pertaining to internal validity. In our experience, assessing external validity is a more complex prospect than judging internal validity. External validity refers to the extent to which causal inference from a particular study holds over different contexts, settings, measures, populations, and so on (Shadish et al. 2002) and may also be thought of as a broad facet of generalization. Similar to internal validity, a number of threats may limit the generalization of the findings of a SCD study. Examples of such threats include multiple-treatment interference (i.e., if an observed outcome was due to multiple and interacting treatments then the effect will not generalize), generality across settings, generality across subjects, and generality across outcomes (Kazdin 2011; Shadish et al. 2002). Threats of these sort deal with the basic question whether an observed effect from a study will hold over changes to subject characteristics, specifics in a setting, and similar but different types of outcomes. The challenge in evaluating the external validity of a given study is partially due to the fact that many factors or characteristics of an experiment may represent a threat to one’s capacity to generalize, with some factors being easily identifiable, whereas others are not. 123 J Behav Educ (2015) 24:459–469 463 Another challenge when evaluating the generalization of findings is that a researcher conducting a review effort to collate evidence across multiple SCDs may not know the point to which consumers of information might wish to generalize. A basic solution to address the above-mentioned situation is to first specify the research questions of interest in a review protocol (e.g., What treatments are effective at improving behavior among K-12 students classified with an emotionalbehavioral disorder?). Articulating the key research questions of a review helps to frame generalization goals of the effort. Next, reviewers must consider different approaches to determining what type of information to include when collating evidence. One option is to include findings from all studies on a specific topic that were located, whether they are characterized by strong validity or not, and assess the totality of evidence pertaining to a logically grouped set of treatments. Alternatively, researchers can use what Maggin et al. (2013) describe as a gating procedure. The first step of a gating procedure consists of identifying studies on a specific topic that will be included in the review. From there, only studies with strong internal validity are considered (i.e., studies must pass an internal validity gate before they are considered further). This process is applied when using the WWC Standards to code SCD studies; that is, studies are coded for evidence criteria only after they pass the design standards. As with most choices, there are trade-offs here. The former option entails reporting all evidence but may also yield some confusion for consumers because synthesis findings might have to explain that some evidence is not very strong or, in the case of the WWC Standards, do not meet design standards. This approach may become especially problematic when there is no clear overall picture (e.g., several studies with varying internal validity support treatment adoption and several do not). The gating procedure, by contrast, can yield findings that are easier to communicate because, whatever the results of component studies, they would all be characterized by reasonably strong internal validity. However, information is functionally barred from informing the review questions on the basis that, if a study is determined to have weak internal validity, it will not be included in the review. The WWC uses the gating procedure for SCDs as well as group-design studies (i.e., randomized controlled trials and quasi-experiments; see WWC 2013). This gating approach is arguably a reasonable one when the goal is to inform practitioners and policymakers, which is the expressed intent of the WWC. Regardless of whether a review uses a gating procedure or not, the next step entails evaluating the external validity of the studies included in the review. Two complexities that come with considering SCD evidence is that questions of internal and external validity concerns are not always mutually exclusive. For example, one must see a detailed description of the baseline and treatment procedures to understand the contrast; yet, baseline details describe the status quo and thus inform generalization. As we mentioned previously, when evaluating the external validity of findings within the context of a gating procedure, the WWC faces the challenge of not knowing exactly the scenarios to which the report consumers might wish to generalize. Nevertheless, the WWC addresses the external validity of findings from multiple SCD studies by taking into consideration what Maggin et al. (2013) describe as criteria for assessing generality. Maggin et al. argue that ‘‘Single-case 123 464 J Behav Educ (2015) 24:459–469 research…requires the collection and careful reporting of critical aspects of the research including information pertaining to participant characteristics, setting procedures, baseline conditions, and operational definitions of the variables being studied’’ (p. 6). One reason for providing a detailed description of these aspects is to allow readers to understand situations to which findings might be generalized. For example, the Participant Information provides information about the demographic and individual characteristics of people included in a study and allows consumers to make inferences on the extent to which people not included in the study may benefit from a treatment. The Setting Description provides details about the context in which the research was conducted while allowing readers to evaluate the extent to which the treatment may be effective when applied in a different context than the one in which the study was conducted. It is also important to consider the description of baseline procedures (i.e., the Baseline Description criterion) so readers understand the treatment contrast being examined by a SCD study and support generalization and replication of findings. The two key variable types, Independent and Dependent, must also be described in detail. The former essentially describes the treatment or intervention examined and the latter deals with outcome variables of interest (e.g., behavior, skill). We concur that examining and reporting of the above-mentioned criteria is necessary and the WWC review process captures these aspects when evaluating SCD studies. The ultimate goal of this process is describing the treatment with sufficient detail so that practitioners and policymakers can make their own decisions about whether the available evidence applies to their circumstances. Moreover, the WWC procedure also takes things a step further by applying a novel threshold for determining whether SCD evidence has been sufficiently repeated or replicated to warrant generation of a report. This threshold is the 5-3-20 rule. In the next section, we describe the review procedures and the threshold so that readers have a better understanding of how the WWC assesses generality of SCD evidence, thus clarifying any potential misunderstanding related to generalization of findings within the context of Standards. We also hope that this provides clear guidelines for researchers who may be interested in conducting independent evaluations of SCD studies. How the WWC Deals with the Generality Criteria Described by Maggin et al. (2013) Table 1 summarizes the generalization details captured by the WWC SCD Review Guide, which is publically available.2 Trained and certified reviewers complete the 2 The WWC SRG Review Guide is subject to change. A copy of the current Review Guide is available here: . The study Review Guide used by and/or referenced herein was developed by the U.S. Department of Education, Institute of Education Sciences through its What Works Clearinghouse project and was used by the authors with permission from the Institute of Education Sciences. Neither the Institute of Education Sciences nor its contractor administrators of the What Works Clearinghouse endorse the content herein. 123 J Behav Educ (2015) 24:459–469 465 Table 1 Summary of WWC SCD review guide items that capture generalization details Maggin et al. (2013) generalization criterion WWC review guide items (summarized) designed t...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture