Course Hero - We put you ahead of the curve!
You have requested the below document.

mcglohenmk042 Texas MCGLOHENMK 042
Sign up now to view this document for free!
  • Title: mcglohenmk042
  • Type: Notes
  • School: Texas
  • Course: MCGLOHENMK 042
  • Term: Fall

Coursehero >> Texas >> Texas >> MCGLOHENMK 042
Course Hero has millions of student submitted documents similar to the one below including study guides, homework solutions, papers, and exam answer keys.

by Copyright Meghan Kathleen McGlohen 2004 The Dissertation Committee for Meghan Kathleen McGlohen certifies that this is the approved version of the following dissertation: The Application of Cognitive Diagnosis and Computerized Adaptive Testing to a Large-Scale Assessment Committee: ________________________________ Hua Hua Chang, Supervisor ________________________________ William Koch ________________________________ Barbara G. Dodd ________________________________ Susan Natasha Beretvas ________________________________ Orzo L. Davis, Jr. The Application of Cognitive Diagnosis and Computerized Adaptive Testing to a Large-Scale Assessment by Meghan Kathleen McGlohen, B.S., M. A. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin In Partial Fulfillment Of the Requirements For the Degree of Doctor of Philosophy The University of Texas Austin May 2004 To my grandmother, Leora Lea Ross Acknowledgements Fist of all, I would like to thank the Department of Educational Psychology for accepting me into the graduate school. I am fortunate to have found a program focused on so many interesting and challenging areas of research and study. The entire quantitative methods faculty has been extraordinarily helpful and encouraging. Specifically, I would like to thank my dissertation committee members. Dr. William Koch is always willing to answer questions with patience and wisdom. Dr. Barbara Dodd, whose door is always open, has been a fount of knowledge to me and to all of the Educational Psychology students. Dr. Tasha Beretvas, an endless source of solutions, is always so willing to help, no matter how much is already on her plate. Dr. O.L. Davis, Jr. challenged me to pursue questions about practically applicable educational issues outside the traditional realm of methodological research. Many thanks to each. I would especially like to thank Dr. Hua Hua Chang. It has been such a privilege and a pleasure to work with such as brilliant mind and supportive advisor. Dr. Chang has been so kind and helpful throughout the entire process. I am certain that I am the first of many, many advisees who will share this sentiment. Thank you so very much. You have a heart of gold. A special thanks to Joshua T. Wills for keeping me fed, keeping a roof over my head, and for supporting and encouraging me in every way possible. Lastly, I would like to thank my mom for telling me in 1999 that changing my major to Psychology was a good idea. v The Application of Cognitive Diagnosis and Computerized Adaptive Testing to a Large-Scale Assessment Publication No. ___________ Meghan Kathleen McGlohen, Ph.D. The University of Texas at Austin, 2004 Supervisor: Hua Hua Chang Our society currently relies heavily on test scores to measure individual progress, but typical scores can only provide a limited amount of information. For instance, a test score does not reveal which of the assessed topics were mastered and which were not well understood. According to the U.S. government, this is no longer sufficient. The No Child Left Behind Act of 2001 calls for diagnostic information to be provided for each individual student, along with information for the parents, teachers, and principals to use in addressing individual student needs. This opens the door for a new area of psychometrics that focuses on the inclusion of diagnostic feedback in traditional standardized testing. This diagnostic assessment could even vi be combined with techniques already developed in the arena of computer adaptive testing to individualize the assessment process and provide immediate feedback to individual students. This dissertation is comprised of two major components. First, a cognitive diagnosis-based model, namely the fusion model, is applied to two large-scale mandated tests administered by the Texas Education Agency; and secondly, computer adaptive testing technology is incorporated into the diagnostic assessment process as a way to develop a method of providing interactive assessment and feedback for individual examinees mastery levels of the cognitive skills of interest. The first part requires attribute assignment of the standardized test items and the simultaneous IRT-based estimation of both the item parameters and the examinee variables under the fusion model. Examinees are classified dichotomously into mastery and non-mastery categories for the assigned attributes. Given this information, it is possible to identify the attributes with which a given student needs additional help. Results from the first portion indicate that the fusion model is indeed an appropriate approach to cognitive diagnosis in a real large-scale assessment. The second part focuses on applying CAT-based methodology, and in particular item selection, to the diagnostic testing process to form a dynamic test that is sensitive to individual response patterns while the examinee is being administered the test. This combination of computer adaptive testing with diagnostic testing will contribute to the research field by enhancing the results that students and their vii parents and teachers receive from educational measurement. Results from this second portion of this dissertation indicate that item selection based on both the overall score and the diagnostic attribute pattern is comparable to item selection based solely on the overall score and is better than selecting items based solely on the diagnostic attribute pattern. viii Table of Contents ABSRACT............................................................................................................. vi LIST OF FIGURES.............................................................................................. Xi LIST OF TABLES................................................................................................ xiii CHAPTER ONE: INTRODUCTION................................................................ 1 CHAPTER TWO: LITERATURE REVIEW................................................... 6 Traditional IRT-Based Testing.............................................................................. 6 Diagnostic Assessment Models............................................................................. 9 Fischer s LLTM..................................................................................................... 10 Tatsuoka s Rule Space Method............................................................................. 11 The Unified Model................................................................................................ 24 The Fusion Model.................................................................................................. 28 Parameters.............................................................................................................. 30 Parameter Estimation............................................................................................. 32 MCMC Estimation................................................................................................. 33 The Metropolis Hastings Algorithm..................................................................... 37 Gibb s Sampling.................................................................................................... 38 Computerized Adaptive Testing............................................................................ 39 Item Selection........................................................................................................ 41 The Shadow Test.................................................................................................... 45 Shannon Entropy.................................................................................................... 51 Kullback-Leibler Information................................................................................ 52 CHAPTER THREE: METHODOLOGY......................................................... 55 PART ONE: Analyzing an Existing Test in a Cognitive Diagnosis Framework... 55 PART TWO: Adaptively Enhancing the Assessment Process............................... 60 Data and Parameters............................................................................................... 60 Research Design..................................................................................................... 64 Condition 1: Theta-based Item Selection.............................................................. 65 Condition 2: Alpha-based Item Selection.............................................................. 66 Condition 3: Theta- and Alpha-based Item Selection............................................ 67 Comparative Evaluation......................................................................................... 74 ix CHAPTER FOUR: RESULTS............................................................................ 80 PART ONE: Analyzing an Existing Test in a Cognitive Diagnosis Framework... 80 PART TWO: Adaptively Enhancing the Assessment Process............................... 87 Single Score Estimation.......................................................................................... 87 Attribute Mastery Estimation................................................................................. 96 Item Exposure......................................................................................................... 106 Overall Performance............................................................................................... 111 CHAPTER FIVE: DISCUSSION....................................................................... 114 PART ONE: Analyzing an Existing Test in a Cognitive Diagnosis Framework... 114 PART TWO: Adaptively Enhancing the Assessment Process............................... 115 Educational Implications Future Directions of Research....................................... 118 APPENDIX A: List of Attributes Measured by Each Test .. 121 APPENDIX B: Plots of Estimates versus True Values of Theta for Probabilities based on the 3PL Model .. 123 APPENDIX C: Conditional Bias Plots of Theta Estimates for Probabilities based on the 3PL Model .. 131 APPENDIX D: Item Exposure Frequencies for Simulations Based on 3PL Model Probabilities . 139 APPENDIX E: Item Exposure Frequencies for Simulations Based on Fusion Model Probabilities. . . . 147 REFERENCES..................................................................................................... 155 VITA...................................................................................................................... 162 x List of Figures FIGURE 1: Two knowledge states in the two-dimensional rule space............... 18 FIGURE 2: Nine items measuring four attributes............................................... 21 FIGURE 3: A tree representing eight knowledge states..................................... 22 FIGURE 4: The iterative process of the shadow test approach........................... 46 FIGURE 5: The branch-and-bound method......................................................... 49 FIGURE 6: Using item response patterns to obtain item and person parameters........................................................................................ 63 FIGURE 7: Computerized adaptive testing simulation process for condition 3........................................................................................ 71 FIGURE 8: Visual representation of the three conditions.................................... 73 FIGURE 9: Visual representation of the research design.................................... 76 FIGURE 10: Correct Attribute Mastery Classification for the Math Blueprint Q-matrix using using Response Probabilities based on the 3PL Model........................................................................... 101 FIGURE 11: Correct Attribute Mastery Classification for the Math Intuitive Q-matrix using Response Probabilities based on the 3PL Model................................................................................ 101 FIGURE 12: Correct Attribute Mastery Classification for the Reading Blueprint Q-matrix using Response Probabilities based on the 3PL Model...................................................................................... 102 FIGURE 13: Correct Attribute Mastery Classification for the Reading Intuitive Q-matrix using Response Probabilities based on the 3PL Model................................................................................ 102 FIGURE 14: Correct Attribute Mastery Classification for the Math Blueprint Q-matrix using Response Probabilities based on the Fusion Model............................................................................ 103 FIGURE 15: Correct Attribute Mastery Classification for the Math Intuitive Q-matrix using Response Probabilities based on the Fusion Model............................................................................ 103 xi FIGURE 16: Correct Attribute Mastery Classification for the Reading Blueprint Q-matrix using Response Probabilities based on the Fusion Model............................................................................ 104 FIGURE 17: Correct Attribute Mastery Classification for the Reading Intuitive Q-matrix using Response Probabilities based on the Fusion Model............................................................................ 104 xii List of Tables TABLE 1: Number of attributes in each Q-matrix............................................... 57 TABLE 2: Average number of attributes per item............................................... 81 TABLE 3: Average number of items per attribute............................................... 81 TABLE 4: Means and standard deviations of * estimates................................. 82 TABLE 5: Means and standard deviations of r* estimates.................................. 83 TABLE 6: Means and standard deviations of proportions of examinees obtaining mastery status..................................................................... 84 TABLE 7: Proportions of flagged examinees...................................................... 85 TABLE 8: Means and standard deviations of proportions of estimates between 0.4 and 0.6............................................................................ 86 TABLE 9: Number of non-convergent cases within each condition when response probabilities based on the 3PL model................................. 88 TABLE 10: Number of non-convergent cases within each condition when response probabilities based on the 3PL model................................ 89 TABLE 11: Correlations between the true and estimated values for response probabilities based on the 3PL model. ............................................. 90 TABLE 12: Correlations between the true and estimated values for response probabilities based on the fusion model. .......................................... 90 TABLE 13: Bias statistics describing the estimated values for response probabilities based on the 3PL model. ............................................. 91 TABLE 14: Bias statistics describing the estimated values for response probabilities based on the fusion model........................................... 91 TABLE 15: Root Mean Square Error of the estimated values for response probabilities based on the 3PL model.............................................. 92 TABLE 16: Root Mean Square Error of the estimated values for response probabilities based on the fusion model........................................... 92 TABLE 17: The math test s attribute mastery hit rates for response probabilities based on the 3PL mode................................................ 97 xiii TABLE 18: The reading test s attribute mastery hit rates for response probabilities based on the 3PL model.............................................. 98 TABLE 19: The math test s attribute mastery hit rates for response probabilities based on the fusion model........................................... 99 TABLE 20: The reading test s attribute mastery hit rates for response probabilities based on the fusion model........................................... 100 TABLE 21: Frequencies and proportions of item exposure for the math test for response probabilities based on the 3PL model.......................... 107 TABLE 22: Frequencies and proportions of item exposure for the reading test for response probabilities based on the 3PL model.......................... 108 TABLE 23: Frequencies and proportions of item exposure for the math test for response probabilities based on the fusion model...................... 109 TABLE 24: Frequencies and proportions of item exposure for the reading test for response probabilities based on the fusion model..................... 110 xiv CHAPTER ONE: INTRODUCTION Typically, large-scale standardized assessments provide a single summary score to reflect the overall performance level of the examinee in a certain content area. The utility of large-scale standardized assessment would be enhanced if the assessment also provided students and their teachers with useful diagnostic information in addition to the single overall score. Currently, smaller-scale assessments, such as teacher-made tests, are the means of providing such helpful feedback to students throughout the school year. Negligible concern is expressed about the considerable classroom time that is taken by the administration of these formative teacher-made tests because they are viewed as integral parts of instruction. Conversely, educators view standardized testing of any kind as lost instruction time (Linn, 1990). Some advantages of standardized test scores over teacher-made test scores are that they allow for the comparison of individuals across various educational settings, they are more reliable, and they are objective and equitable (Linn, 1990). The advantage of teacher-made tests, on the other hand, is that they provide very specific information to the students regarding their strengths and weaknesses in the tested material. Large-scale standardized testing would be even more beneficial if it could also contribute to the educational process in a role beyond that of evaluation while maintaining these existing advantages, such as the reporting of diagnostic feedback. Then the students could use this information to target improvement in areas where they are deficient. 1 A new approach to educational research has begun to effloresce in order to provide the best of both worlds. This research area, dealing with the application of cognitive diagnosis in the assessment process, aims to provide helpful information to parents, teachers, and students, which can be used to direct additional instruction and study to the areas needed most by the individual student. This beneficial information provided by diagnostic assessment deals with the fundamental elements or building blocks of the content area. An attribute is not the same as a content area, but rather the combination of these elements or attributes represents the content domain of interest. This form of diagnostic assessment is an appropriate approach to conducting formative assessment, because it provides specific information regarding each measured attribute or content element to every examinee, rather than a single score result. An ideal assessment would not only be able to meet the meticulous psychometric standards of current large-scale assessments, but would also be able to provide specific diagnostic information regarding the individual examinee s educational needs. In fact, the provision of this additional diagnostic information by large-scale state assessments has recently become a requirement; the No Child Left Behind Act of 2001 mandates that such feedback be provided to parents, teachers and students (U.S. House of Representatives, 2001). Despite this requirement, constructing diagnostic assessment from scratch is expensive and impractical. A more affordable solution is to incorporate diagnostic measurement into existing assessments that state and local governments are already 2 administering to public school students. Thus, in order to incorporate the benefits of diagnostic testing with the current assessment situation, cognitively diagnostic approaches would need to be applied to an existing test. Diagnostic assessment is a very advantageous approach to measurement. In traditional testing, different students may get the same score for different reasons (Tatsuoka M. M. and Tatsuoka, K. K., 1989), but in diagnostic testing, some of these differences can be discovered and shared with the examinee and his/her teacher. Diagnostic assessment allows the testing process to serve an additional instructional purpose in addition to the traditional purposes of assessment (Linn, 1990), and can be used to integrate instruction and assessment (Campione and Brown, 1990). Furthermore, diagnostic testing offers a means of selecting instructional material according to an individual s needs (Embretson, 1990). While traditional tests can accomplish assessment goals, such as a ranked comparison of examinees or grade assignments based on certain criteria, they do not provide individualized information to teachers or test-takers regarding specific content in the domain of interest (Chipman, Nichols, and Brennan, 1995). Traditional assessment determines what an individual has learned, but not what s/he has the capacity to learn (Embretson, 1990). Diagnostic assessment can be applied to areas involving the identification of individuals who are likely to experience difficulties in a given content domain, and it can help provide specific information regarding the kinds of help an individual needs. Furthermore, the cognitive 3 diagnosis can be used to gauge an individual s readiness to move on to higher levels of understanding and skill in the given content domain. (Gott, 1990, p. 174). Current approaches dealing with cognitive diagnosis focus solely on the estimation of the knowledge state, or attribute vector, of the examinees. This dissertation proposes the combination of the estimation of item response theory (IRT)-based individual ability levels ( ) along with an emphasis on the diagnostic feedback provided by individual attribute vectors ( ), thus bridging the current standard in testing technology with a new area of research aimed at helping students benefit from the testing process through diagnostic feedback. This research is two-fold. The first goal is to analyze the results of an existing large-scale assessment from a cognitively diagnostic framework in order to demonstrate that this is possible. The second part deals with incorporating computer adaptive testing technology to the cognitively diagnostic assessment process. The goal of the second part of this research is to not only simultaneously measure individuals knowledge states and conventional unidimensional IRT ability levels, but to do so in an efficient way. This research will apply the advantages of computerized adaptive testing to the new measurement area of cognitive diagnosis. The goal of computerized adaptive testing is to tailor a test to each individual examinee by allowing the test to hone in on the examinees ability levels in an interactive manner. Accordingly, examinees are relieved from answering many 4 items that are not representative of their abilities. To accomplish this goal, this research study relies on the technology of computer adaptive testing. The aim of the second part of this research is to combine the advantages of computerized adaptive testing with the helpful feedback provided by cognitively diagnostic assessment, to enhance the existing testing process. This dissertation proposes a customized diagnostic testing procedure that provides both conventional unidimensional ability estimates as well as a report of attribute mastery status to the examinees, instead of just one or the other. The key idea of this work is to utilize the shadow test technique to optimize the estimation of the traditional IRT-based ability level, , and then select an item from this shadow test that is optimal for the cognitive attribute vector, , for each examinee. Each of the two major parts of this research project aims to contribute to the field of cognitive diagnosis. But first, the founding concepts of this research must be elucidated. 5 CHAPTER TWO: LITERATURE REVIEW Traditional IRT-Based Testing Item response theory (IRT) is a common foundation for wide-scale testing. IRT is based on the idea of test homogeneity (Loevinger, 1947) and logistic statistical modeling (Birnbaum, 1968), and uses these probabilistic models to describe the relationship between item response patterns and underlying parameters. IRT uses the item as the unit of measure (rather than the entire test) to obtain ability scores that are on the same scale despite the differences in item administration across examinees (Wainer and Mislevy, 2000). As outlined in Rogers, Swaminathan and Hambleton s (1991) text, two main axioms are assumed when employing IRT: (1) The performance of an individual on a set of test items can be explained by an underlying construct, latent trait, or set thereof. The context of educational testing uses individual ability levels as the trait, which accounts for correct/incorrect response patterns to the test items. (1) The interconnection between this performance and the intrinsic trait or set of traits can be represented by a monotonically increasing function. That is to say, as the level of ability or trait of interest increases, the probability of a response reflecting this increase (specifically for this context, a correct response) also increases (Rogers, Swaminathan and Hambleton, 1991). 6 A plethora of possible IRT probability models are available, and a few of the most common will be briefly discussed. Each of these three models maps out a probabilistic association between the items and the ability level of the examinee (j), denoted as j . These three models are respectively referred to as the one-, two-, and three-parameter logistic models. The parameter included in the one-parameter logistic model is the item difficulty, denoted as bi for each item i. This takes into account the level of the ability trait needed in obtaining a correct response to each item. The probability of obtaining a correct response to item i given a specific ability level is shown in Equation 1: Pij (Yi = 1 ) = e( bi ) 1+ e( bi ) (1) for i = 1, 2, , n, where n is the total number of items, and Yi denotes the response to item i (Rogers, Swaminathan and Hambleton, 1991). The one-parameter logistic model is also referred to as the Rasch model. Next, the two-parameter logistic model also involves this item difficulty parameter, bi, but includes another item parameter that describes the item discrimination, denoted as ai. Item discrimination reflects an item s facility in discriminating between examinees of differing ability levels. The value of the item discrimination parameter is proportional to the slope of the probability function at the location of bi on the ability axis (Rogers, Swaminathan and Hambleton, 1991). The probability of obtaining a correct response to item i given an ability level , is shown in Equation 2, as described by the two-parameter logistic model: 7 Pi (Yi = 1 ) = e Da i ( bi ) 1+ e Da i ( bi ) (2) for i = 1, 2, , n, and where D is a scaling constant, and n and Yi hold the same meaning as in the previous model (Rogers, Swaminathan and Hambleton, 1991). Finally, the three-parameter logistic model (3PL) includes both the difficulty level bi and discrimination parameter ai, but also adds a third item parameter, called the pseudo-chance level, denoted as ci (Rogers, Swaminathan and Hambleton, 1991). The pseudo-chance level allows for the instance where the lower asymptote of the probability function is greater than zero; that is to say, the examinees have some nonzero probability of responding correctly to the item regardless of ability level. The probability of obtaining a correct response to item i given a level of ability under the 3PL model is shown in Equation 3: Pi (Yi = 1 ) = c i + (1 c i ) e Da i ( bi ) 1+ e Da i ( bi ) (3) for i = 1, 2, , n, and all variables are defined as previously noted (Rogers, Swaminathan and Hambleton, 1991). Notice the similarities between these models, and in particular, how each is based on the previous. In fact, the latter two can simplify to the other one(s) by setting ci to zero (for the two-parameter logistic model) and by setting ai equal to one and ci to zero (for the one-parameter logistic model). The next portion of this work will discuss a different approach to testing called diagnostic assessment, and will eventually lead to a discussion about how to 8 combine these long-established IRT approaches with new diagnostic assessment techniques. Diagnostic Assessment Models The focus of cognitive diagnosis is to provide individual feedback to examinees regarding each of the attributes measured by the assessment. An attribute is defined as a task, subtask, cognitive process, or skill somehow involved in the measure (Tatsuoka, 1995, p.330). But measuring individuals with respect to the attributes is not the only requirement of a good cognitive diagnosis model. A model must also allow the items to be examined in the context of the cognitive diagnosis, or else the results from the assessment cannot be standardized or understood in a larger testing framework (Hartz, 2002). In fact, Hartz, Roussos, and Stout (2002) describe three desirable characteristics of an ideal cognitive diagnosis model as the ability to (1) Allow the attributes to be appraised with respect to individual examinees, (2) Allow the relationship between the items and the attributes to be evaluated, (3) Statistically estimate the parameters involved in the model. There are at least fourteen models for diagnostic assessment, most of which fall under two major branches. The first deals with models based on Fischer s (1973) Linear Logistic Test Model (LLTM). The second branch of the cognitive diagnosis literature deals with models that employ Tatsuoka s and Tatsuoka s (1982) 9 Rule Space methodology as a foundation. Each model has its strengths and weaknesses. A handful of the models will be described, followed by a justification of the model of choice for this research study. Fischer s LLTM One of the first cognitive diagnosis models was the LLTM, developed by Fischer in 1973. The LLTM is an expansion of the basic Rasch model to take into account the cognitive operations required for correct item responses. To do this, the Rasch item difficulty is partitioned into discrete cognitive attribute-based difficulties. Hence, the Rasch item difficulty (denoted as i) equals the weighted sum of these attribute-based difficulties, as shown in Equation (4): i = f ik k + c k (4) where fik is the weight of factor k in item i, k is the difficulty parameter (or effect ) of factor k across the entire exam, and c is a normalizing constant (Fischer, 1973). The weight of factor k in item i, denoted as fik, indicates the extent to which factor k is required by item i. (Please note that this concept is analogous to a Q-matrix entry, which is discussed in further detail subsequently.) Substituting this for the item difficulty parameter in the Rasch model yields the LLTM, as presented in Equation (5): P X ij = 1 j = ( ) e j ( f ik k +c ) k 1+ e j ( f ik k +c ) k (5) 10 where Xij equals one when examinee j responds correctly to item i and equals zero otherwise. The key idea in the LLTM that makes it applicable in the context of diagnostic assessment is that the item difficulty is comprised of the composite of the influences of the basic cognitive operations, or factors, necessary for correctly solving an item. These operations can be thought of as cognitive building blocks or attributes. The person parameter, however, remains a single unidimensional ability parameter ( j), without any sort of attribute-specific estimate for individual examinees. The LLTM lays the foundation for exploring the cognitively diagnostic relationship between items and their underlying attributes, but it does not incorporate a measure to identify the presence or absence of such attributes in individual examinees. Tatsuoka s Rule Space Methodology The rule space methodology was developed by Kikumi Tatsuoka and her associates (1982), and is comprised of two parts. The first part involves determining the relationship between the items on a test and the attributes that they are measuring. Each examinee may or may not hold a mastery-level understanding of each attribute, and in fact, may hold a mastery-level understanding of any combination thereof. The combinations of attributes which are mastered and not 11 mastered by an individual examinee are depicted in an attribute vector, which is also referred to as a knowledge state. The description of which items measure which attributes is illustrated in a Qmatrix. A Q-matrix is sometimes referred to as an incidence matrix, but not in this text. The Q-matrix is a [K x n] matrix of ones and zeros, where K is the number of attributes to be measured and n is the number of items on the exam. For a given element of the Q-matrix in the kth row and the ith column, a one indicates that item i does indeed measure attribute k and a zero indicates it does not. For example, notice the following [3 x 4] Q-matrix: i1 A1 0 Q= A2 1 A3 1 i2 i3 i4 1 0 1 0 0 1 0 1 0 The first item measures the second and third attributes, while the second item measures the first attribute only. The third item only measures the third attribute, while the fourth item measures the first two attributes. The consultation of experts in the measured content domain is a good approach to constructing the Q-matrix to determine if an item measures a particular attribute. Other possible approaches to constructing Q-matrices include borrowing from the test blueprint or intuitively evaluating each item to infer which attributes are being assessed. Each element in the Q-matrix is denoted as qik where the subscripts i and k denote the item and attribute of interest, respectively. This qik representation mirrors the fik, parameter in Fischer s LLTM (symbolizing the weight of factor k in item I, as discussed on page 9 12 of this chapter). They are not exactly the same, however, because Fischer s fik, weight can take on values greater than unity, while the Q-matrix entry qik cannot. They are synonymous, however, when all of the fik weights are dichotomous. Next, the information provided by the Q-matrix needs to be translated into a form that can be compared with individual observed item response patterns. An observed item response pattern is a vector of ones and zeros that represents how an individual performed on a test. For example, a response pattern of [ 0 1 0 1 1 1 ] indicates that the individual responded incorrectly to the first and the third items on the test, but answered each of the other four items correctly. This comparison between the Q-matrix and individual observed item response patterns is accomplished by establishing a series of ideal response patterns. An ideal response pattern is a response pattern that is obtained through a particular hypothetical combination of mastery and non-mastery levels of the attributes (Tatsuoka, 1995). Notice that the word ideal does not indicate a perfect response pattern, but rather suggests perfect fit with an underlying theory, (Tatsuoka, 1995, p. 339). These ideal response patterns are ascertained systematically by using rules. A rule is defined as a description of a set of procedures or operations that one can use in solving a problem in some well-defined procedural domain, (Tatsuoka & Tatsuoka, 1987, p.194). Rules are determined through logical task analysis (Tatsuoka, 1990) and deterministic methods used in artificial intelligence (Tatsuoka and Tatsuoka, 1987). A computer program is used to generate the possible ideal response patterns that would be obtained from the application of a 13 variety of rules. Indubitably, both correct rules and erroneous rules exist for any given assessment. The consistent application of all of the correct rules would result in correct answers for the entire test. The consistent application of some correct rules with other incorrect rules would result in a specific response pattern including both ones and zeros. If a student consistently applies a specific combination of rules to all of the items in a test, then his/her response pattern would match exactly the ideal response pattern produced by the computer program for that exact combination of rules (Tatsuoka, 1995). This provides a finite number of ideal response patterns with which the observed response patterns can be compared. In order for an examinee to get an item right, s/he must possess all of the attributes that the item measures. This is analogous to electrical circuitry: a current can only flow when all switches are closed. In this analogy, a closed switch symbolizes a mastered attribute and an electrical current represents a correct response to an item. A correct response to an item can only be obtained by an examinee if all attributes involved in a given item are mastered (Tatsuoka and Tatsuoka, 1997; Tatsuoka, 1995). Boolean algebra can be used to explain the cognitive requirements for item response patterns (Birenbaum and Tatsuoka, 1993). Boolean algebra is a mathematical discipline dealing with sets or collections of objects, such as attributes, and is commonly used in dealing with circuits. (For a detailed description of Boolean algebra, please see Whitesitt, 1995). Furthermore, one specific feature of Boolean algebra, called Boolean descriptive functions, can be used to map the relationship between the attributes and item response patterns 14 (Tatsuoka, 1995). Consequently, the various attribute vectors, also known as a knowledge states, can be determined from the set of erroneous rules used to find the ideal response pattern that corresponds with an individual s observed response pattern. For instance, if an individual taking a math test knows and follows all of the applicable rules correctly except that s/he does not know how to borrow in subtraction, then s/he would get every item right except those that involve borrowing. Thus, his/her attribute vector would include a value of unity for all the attributes except the one(s) that involve borrowing, which would be represented in the attribute vector by a zero. Then diagnostic information would be provided that explains this individual needs more instruction in the area of borrowing in subtraction. Unfortunately, this scenario does not adequately reflect the reality of a testing situation in its entirety. When an examinee applies an erroneous rule, s/he most likely does not apply the same erroneous rule consistently over the entire test (Birenbaum and Tatsuoka, 1993). Consequently, this inconsistency results in deviations of the observed response pattern from the ideal response pattern. Moreover, careless errors, uncertainty, fatigue, and/or a temporary lapse in thinking can result in even more deviations of an observed response pattern from the ideal response patterns. Tatsuoka (1990) states, Even if a student possesses some systematic error [i.e. is following an erroneous rule], it is very rare to have the 15 response pattern perfectly matched with the patterns theoretically generated by its algorithm, (p. 462). In fact, the total number of possible response patterns is an exponential function of the number of items. For n items, 2n possible response patterns exist. Fortunately, Boolean algebra can be utilized to help reduce the overwhelming number of possible item response patterns to a quantity more computationally manageable. These random errors made by examinees, referred to as slips, can be statistically thought of in terms of the probability of a slip occurring or even the probability of having any number of slips occur. Tatsuoka and Tatsuoka (1987) derive a theoretical distribution of the slips, and call it the bug distribution. The bug distribution calculates the probability of making up to s number of slips by multiplying the probabilities of a slip occurring for all items where one did occur and the probabilities of a slip not occurring for all items where one did not occur, and summing across all s slips for all possible combinations of s slips. This probability distribution for a given rule R is presented in Equation (6): n piui (1 pi ) ui s= 0 ui = s i=1 S (6) t n with a mean of R = pi + qi i=1 i= t +1 n 2 and a variance of R = piqi i=1 16 where the number of slips, denoted as s, ranges from one to S, and ui equals unity when a slip occurs on item i and is assigned a zero when it does not. Also, n is the number of items and t denotes the total score (Tatsuoka and Tatsuoka, 1987; Tatsuoka, 1995). Conveniently, response patterns that result from inconsistencies in rule applications cluster around the corresponding ideal response patterns; however, the response variance that makes up these clusters complicates classification of the individual examinees because it can no longer be achieved by matching up observed response vectors exactly with the ideal response vectors. Due to the variability of possible item response patterns, a method is needed for classifying individuals into knowledge states (i.e. identifying their attribute vector) when the response patterns do not reflect exactly an ideal response pattern. To address this, the second part of the rule space methodology deals with the construction of an organized space for classifying the examinees response patterns into the knowledge state categories that are established in the first part of the methodology. Once all possible knowledge states are generated from the Q-matrix, item response theory is used to construct this classification space for identifying each of the examinees into one of the predetermined knowledge states. Unfortunately, the attribute variables are unobservable constructs and cannot be represented in such a space directly. Instead, the item response theory functions are used in combination with a new parameter, (zeta), developed by Tatsuoka (1984) to measure the 17 atypicality of the response patterns. Values of are calculated as the standardized product of two residual matrices (where residuals are calculated as the difference between an observed and expected value and standardization is achieved by dividing the product by the standard deviation of its distribution). This parameter can be depicted with the item response theory person parameter , which symbolizes ability level, as axes in a Cartesian space in which the knowledge states for each of the ideal response patterns can be graphically represented (Tatsuoka, 1990), as demonstrated by Figure 1. B A Figure 1: Two knowledge states in the two-dimensional rule space. Notice that knowledge state A is farther left on the (ability) scale than knowledge state B. This means that knowledge state B requires a higher level of ability to acquire than knowledge state A. Also notice that knowledge state A is much closer to the axis (where the value of , the atypicality index, is zero), which means that this knowledge state occurs more frequently than B. Conversely, 18 knowledge state B is farther away from the axis (i.e. the magnitude of is greater), which means that it is a more atypical knowledge state. Likewise, the various observed response patterns can be represented in terms of each pattern s estimated and levels. Furthermore, this Cartesian space can be used to determine which ideal response pattern an observed response pattern is closest to (Birenbaum and Tatsuoka, 1993). Closeness between ideal and observed item response patterns can be approximated by a distance measure, such as the Mahalanobis distance; this metric describes the distance between an observation and the centroid of all observations (Stevens, 1996). Also, Bayes decision rules may be used to minimize misclassifications (Birenbaum and Tatsuoka, 1993). Bayes decision rules simply use probabilities to determine which category is most likely when uncertainty is present. The categories are then used to determine an individual s attribute mastery pattern. Once this classification has been carried out, one can indicate with a specified probability level which attributes a given examinee is likely to have mastered, (Birenbaum and Tatsuoka, 1993, p. 258). Attribute mastery patterns are represented as a vector of zeros and ones, where a one signifies mastery of a given attribute and zeros signify non-mastery. An attribute vector, containing k (the total number of attributes) elements, is estimated for each individual and denoted as . For example, if a test measures three attributes, the estimated attribute vector for a particular examinee may be j = [ 0 1 1 ], meaning s/he has mastered the second and third attributes, but not the first. 19 Attribute vectors can be used to provide helpful diagnostic information that specifies the area(s) each examinee needs help in. For this example examinee, the diagnostic information would include an indication that the student needs to work on the educational construct corresponding to the first attribute. The rule space methodology allows for graphical representation of more concepts than just the rule space (as depicted in Figure 1). Other important features of this model can be illustrated visually, such as probability curves, influence diagrams, and knowledge state trees. An influence diagram displays which items measure which attributes and which attributes influence other attributes. In Figure 2, the influence diagram represents the item attribute relationships as described by the Q-matrix as well as the relationship between the attributes. 20 A2 A1 A3 A4 Figure 2: Nine items measuring four attributes. In this figure, an arrow indicates an item measures a certain attribute, with the arrowhead facing the item. In this example, four items, numbers 3, 5, 6, and 9, measure two attributes each, while the remaining items measure one attribute each. Also, the influence of attribute 1 on attribute 3 is represented by an arrow as well. This indicates that attribute 1 is a prerequisite for attribute 3. The rule space methodology can also be used to construct a knowledge state tree. A knowledge state tree is a very important graphical representation because it portrays the incremental relationships between the knowledge states. This is particularly useful because it draws out how to improve from one knowledge state to 21 a more mastered knowledge state (Tatsuoka and Tatsuoka, 1997). For example, see Figure 3. = +3 Mastery State Cannot do A3 Cannot do A4 3.4 Cannot do A3 & A4 4.9 2.0 Cannot do A2 & A4 =0 3.8 2.6 3.1 Cannot do A1, A3 & A4 Cannot do A2, A3 & A4 1.9 Cannot do A1, A2, A3, &A4 2.2 = -3 Figure 3: A tree representing eight knowledge states. 22 In traditional testing, higher scores would be assigned to individuals in the knowledge states that are higher in the figure. Knowledge states lower in the figure represent lower traditional scores. Notice the knowledge states Cannot do A3 and Cannot do A4 are next to each other. In a traditional testing situation, individuals within both of these knowledge states would receive the same score. Alternatively, the rule space method allowed these examinees to be provided more specific information about their abilities. Obviously, it is possible for an examinee to lack more than one attribute. This tree is useful because it shows the order in which the non-mastered attributes need to be learned. For instance, if a student cannot do attribute 1, 3, or 4 (i.e. his/her attribute vector is [0, 1, 0, 0]), then the diagram dictates that s/he should next move to the knowledge state Cannot do A3 & A4. Therefore, the student should learn attribute 1 next. Then there is a choice to move to either state Cannot do A3 or cannot do A4. In such a situation, the next appropriate knowledge state is the one with the shortest Mahalanobis distance (Tatsuoka and Tatsuoka, 1997). In this case, the next knowledge state to achieve would be cannot do A3 with a Mahalanobis distance 3.4; so the student needs to learn attribute 4. Then, once attribute 3 is mastered, the student will accomplish the knowledge state total mastery. Tatsuoka and Tatsuoka (1997) suggest programming this method of determining an appropriate path for instruction in the form of a computer adaptive tutor for remedial instruction. This is revolutionary because it embarks on the combination of the rule space method of diagnostic assessment with computer 23 adaptive assessment to provide immediate feedback and immediate individualized instruction in the area(s) the examinee needs most. The rule space methodology is revolutionary and advantageous in many ways, but there is always room for improvement. One drawback is that the rule space methodology does not take into account a way to evaluate the relationships between the items and the attributes. Therefore, it is preferable to use a model that has the benefits of the rule space methodology, but also incorporates a method for evaluating this relationship to determine how well the items in the assessment measure the attributes at hand. The Unified Model DiBello, Stout, and Roussos (1995) based their new model, named the unified model, on the rule space method. They attempted to improve on one of the underlying ideas of the rule space approach. In the unified model, the source of random error is broken down into different four types of systematic error. In the rule space model, all of these would be considered random slips. They examined the possible sources of random errors and categorized them into four groups. Hence, while there is only one type of random error in the rule space model (slips) there are four sources of aberrant response variance in the unified model. First, they explain that strategy selection is a source of response variation, because an examinee may answer an item using a different strategy than the one assigned in the Q-matrix. Second, completeness of the Q-matrix is considered an 24 important issue. An item may measure an attribute that is not listed in the Q-matrix. For example, a worded math problem includes a verbal component; if the Q-matrix does not contain a verbal attribute, then the Q-matrix would be considered incomplete. Third, the unified model takes positivity into account. Positivity addresses inconsistencies that arise when students who do not possess a certain attribute happen to respond correctly to an item that measures the attribute, or when students who do possess a certain attribute do not apply it correctly and respond erringly to an item measuring the possessed attribute. Positivity takes on a high value when individuals who possess an attribute use it correctly, while students who lack an attribute miss the items that measure it. The less this is the case, the lower the value of positivity. Lastly, a category remains for random errors that are not caused by any of these other three issues. These are called slips and include mental glitches such as finding the correct solution to a problem and then bubbling in a different multiple-choice option. Notice the term slips is used more generally for the rule space approach than the unified model. DiBello, Stout, and Roussos (1995) introduce a new parameter for dealing with the issues of strategy choice and incompleteness of the Q-matrix called the latent residual ability (confusingly, this is denoted as j, but is different than the IRT ability level ). The unified model is the first to include such a parameter. The latent ability space consists of Q, which is the part addressed in the Q-matrix, and b, which is the remaining latent ability not included in Q. This parameter j is 25 intended to measure underlying construct b, while Q is measured by the parameter j. One might ask, why not simply add more attributes to the Q-matrix to account for these issues? More attributes mean more attribute parameters. While additional parameters may allow for enhanced distinctions and alleviate certain classification problems caused by strategy choice and incompleteness of the Qmatrix, these added parameters would most likely complicate the measurement process more than they would benefit it. More parameters require a greater degree of complexity in the estimation procedure. Also, an increase in the number of attribute parameters to be estimated requires an increase in the number of items on the test to obtain acceptable reliability (DiBello, Stout, and Roussos, 1995). But this may not be practical when so many educators feel test administration is already too long. For the sake of parsimony, including additional parameters is only advantageous when there is a real measurement interest in assessing the mastery/non-mastery of those attributes. Hence, the inclusion of these issues in a model without having to add more attribute parameters is optimal, and this is what DiBello, Stout, and Roussos (1995) have accomplished. The unified model is illustrated in Equation (7): P X i = 1 j , j = di ik jk k=1 ( ) K q ik (1 jk ) q ik ik r Pc i ( j ) + (1 di )Pbi ( j ) (7) where jk denotes examinee j s mastery of attribute k, with a one indicating mastery and a zero denoting non-mastery. Also, qik is the Q-matrix entry for item i and attribute k, and j is the latent residual ability and P( j) is the Rasch model with the 26 item difficulty parameter specified by the subscript of P. The parameter ik is the probability that person j will correctly apply attribute k to item i given that person j does indeed possess attribute k; mathematically, this is written as ik = P(Yijk = 1 jk = 1) with Yijk equaling unity when correct application of the attribute is present. Lastly di is the probability of selecting the Q-based strategy over all other possible strategies. The unified model includes a large number of parameters to deal with a plethora of psychometric elements. Having such an amplified set of parameters in the model is both a blessing and a curse. It is a precarious balance that must be met between improving accuracy through the inclusion of more parameters and the issue of statistical identifiability of the many parameters in the model. Jiang (1996) demonstrated that, in fact, the 2Ki+3 item parameters contained in the unified model are just too many to be uniquely identified. This model is named the unified model because it uses a deterministic approach to estimating knowledge state classification, and yet it also takes into account random errors. Hence, it unifies both deterministic and stochastic approaches. The unified model is advantageous in that it takes into account the necessity for assessing examinees with respect to underlying attributes, as well as the requirement for examining the relationship between the items and the attributes rather than just one or the other. Some other advantages of the unified model include the innovative use of the latent residual ability j to help avoid the problems 27 associated with too many latent classes being incorporated into the assessment process (a suitable alternative to the addition of superfluous attribute parameters) and the ability for the model to handle the use of multiple solution strategies by examinees (DiBello, Stout, and Roussos, 1995). In order to apply the unified model in diagnostic assessment, the item parameters must be estimated, which is to say, the model must be calibrated. However, the model lacks practicality in this sense because the parameters involved are not uniquely statistically estimable (Jiang, 1996). The Fusion Model The fusion model was based on the unified model (Hartz, Roussos, and Stout, 2002) which, in turn, was based on Tatsuoka s rule space methodology (DiBello, Stout, and Roussos, 1995). The fusion model retains the advantages of the unified model while reducing the number of parameters involved so that they are statistically identifiable. The unified model has 2Ki+3 parameters for each item, while the fusion model only has K+1 (where K is the number of attributes). Fischer s LLTM does not estimate individuals mastery level of each attribute. The Rule Space model does not evaluate the relationship between the items and the attributes. The Unified Model satisfies both of these needs, but does not have statistically estimable parameters. The fusion model simplifies the unified model so that the parameters may be estimated. The fusion model was selected as the cognitive analysis model of choice for this study due to the fact that it includes 28 all three features described by Hartz, et al. (2002) as crucial for a successful cognitive diagnosis model, including (1) the estimation of examinees attribute mastery levels, (2) the ability to relate the items to the attributes, and (3) statistical identifiability of the model's parameters. The item response function for the fusion model is illustrated below in Equation (8), as described by Hartz, et al. (2002): K * P ( X ij = 1 j , j ) = * rik i k=1 (1 jk ) q ik Pc i ( j ) (8) where Pc i ( j ) = The Rasch model with difficulty parameter ci . K = * i P(Y k=1 ijk = 1 jk = 1) q ik * rik = P(Yijk = 1 jk = 0) P(Yijk = 1 jk = 1) Yijk = 1 when examinee j correctly applies attribute k to item i, and 0 otherwise. jk = 1 when examinee j has mastered attribute k, and 0 otherwise. ci = the amount the item response function relies on j after accounting for the attribute assignments in the Q-matrix. Also, the attribute vector for individual j is denoted as j , and j is the residual ability parameter, which deals with content measured by the test that is not included in the Q-matrix. 29 Parameters The fusion model introduces many new parameters not present in any other model; therefore a brief explanation of these parameters would be appropriate. First the three item parameters will be described, and then the two person parameters will be explained. Last, the attribute parameter will be discussed. The parameter i* equals the probability of correctly applying all attributes required by item i given that the individual possesses all of the required attributes for the item i. As a result, i* reflects the difficulty of item i and is referred to as the Qbased item difficulty parameter. It affects a person s capacity to answer the item correctly despite his/her possession of all the involved attributes. Just as with the item difficulty of classical test theory, the values of i* must remain between zero and unity, and a high value indicates easiness rather than difficulty. Next, the * parameter rik represents the proportion of the probability of obtaining a correct response when the examinee does not have the required attribute divided by the probability of responding correctly to the item when the required attribute is * possessed by the examinee. Hence, rik is considered the discrimination parameter of item i for attribute k. This parameter is described as the penalty for lacking attribute * k. A high rik value signifies that attribute k is not important in producing a correct response to item i (Hartz, 2002, Hartz, et al., 2002). This parameter is a ratio of a smaller probability over a larger one, and its values also remain between zero and 30 unity. Third, the parameter ci deals with the item response function s reliance on the residual ability parameter j , which measures the examinee on constructs that are beyond the scope of the Q-matrix. Therefore, the ci parameter is the completeness index for item i. This parameter s value remains between zero and three, with a high value meaning the Q-matrix is complete and therefore a correct response to the item does not rely heavily on j . In sum, a good item would have a high i* , a high ci * * and a low rik . Each item has one i* parameter, one ci parameter, and a rik parameter for each attribute that the given item measures. The fusion model estimates two types of parameters for the examinees. The first is a vector denoted as j which identifies the attributes which have been estimated as mastered by examinee j. This is the same as j from the unified model. The values in the attribute vector j are continuous, and individual mastery/nonmastery status is defined by dichotomizing these values. The specific attribute mastery estimates jk that are greater than or equal to 0.5 are assigned the status of mastery for examinee j on attribute k; the remaining values are assigned the status of non-mastery for that attribute. The other parameter specific to the fusion model is the residual ability parameter j . (This is the same parameter as j in the unified model, presented in Equation 7, but the notation of j has been adopted to attempt clarify that this is not the same person parameter, j , that is present in the 3PL model, presented in Equation 3.) This j parameter measures the left over ability 31 construct required by the test but not included in the Q-matrix. This can be thought of as a measure of higher mental processes (Samejima, 1995, p. 391), a measure for dealing with multiple solution strategies, or merely a nuisance parameter, (DiBello, Stout, and Roussos, 1995, p. 370). The inclusion of this parameter is important because it allows us to acknowledge the fact that Q-matrices are not complete representations of what is required by an exam (Hartz, 2002; DiBello, Stout, and Roussos, 1995). The fusion model also consists of one parameter for each attribute measured in the assessment. This parameter is denoted as pk and is a sample estimate of the proportion of the population that has mastered attribute k. This parameter can be used to determine how well each attribute is being assessed by examining the standard error of this estimate. In fact, this is one of the steps in the stepwise reduction process discussed later in the first part of Chapter Three. A software program called Arpeggio was developed by Stout et al. (2002) to analyze item responses by the fusion model. A manual for this software program is available in Hartz, Roussos, and Stout (2002). Parameter Estimation Like any model, the fusion model requires an estimation procedure for the parameters involved. The fusion model has several different parameters that must be estimated simultaneously, and therefore requires a powerful procedure for calculating them. The Markov Chain Monte Carlo (MCMC) method is a versatile 32 estimation procedure that has become an important tool in applied statistics (Tierney, 1997). MCMC is used in parameter estimation in the fusion model because it can elegantly handle the otherwise arduous, perhaps even impossible task of simultaneously estimating the K+1 item parameters as well as the K entries in the attribute vector of examinee parameters. MCMC Estimation. MCMC looks at the probability of the next observation assuming it depends solely on the current observation. Hence, this method is based on a Bayesian framework. The field of Bayesian statistics is based on Bayes theorem and uses a prior distribution, which involves known information about the parameters of interest, to estimate posterior distribution, which is the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data, (Gelman, et al., 1995, p. 3). Bayesian techniques are an advantageous approach to inferential statistical analyses in two important ways. First, the use of prior information utilizes known information about the relationships between the variables involved while the data provides information about unknown characteristics of the variables relationships. For instance, positive correlations exist between examinee parameters, so it would be appropriate not to waste this information during estimation (Hartz, 2002). A Bayesian framework allows for this sort of flexibility in the prior distribution of parameters (Hartz, 2002). And second, the prior distributions do not untenably influence the posterior distributions (Hartz, 2002). In sum, a Bayesian framework allows the incorporation of important 33 information already known about the item parameter distributions, but is not adversely affected when such useful information is not available. (For more information regarding Bayesian analytic methods, please see Gelman, et al., 1995.) To understand how MCMC estimates information about future observations or states, it is easiest to first consider a simple case involving a dichotomous random variable. At any given time, this variable could take on a value of zero or unity. A Markov Chain deals with such a variable over various mathematical states. Consider the probability of achieving a second state given the initial state. This probability can be represented by a matrix with the number of rows and columns matching the number of possible states. When only looking at two states, the transition matrix from state one to state two may be illustrated by the below probability matrix below: p ps1 s2 = 00 p10 p01 p11 where p00 = the probability of the variable taking on a value of zero in the first state and zero in the second state. p01= the probability of the variable taking on a value of zero in the first state and one in the second state. p10 = the probability of the variable taking on a value of one in the first state and zero in the second state. 34 p11 = the probability of the variable taking on a value of one in both states. Regardless of whether the variable takes on a value of zero or one in either state, this matrix can be used to determine the probability of the transition to state two. A bit of matrix algebra can be used to determine the probability of obtaining a value of one or zero for the second state by multiplying the transition matrix of probabilities and the vector of probabilities of the initial state as illustrated in Equation (9). p ps2 = 00 p10 p01 ps1 = 0 p00 ps1 = 0 + p01 ps1 =1 ps2 = 0 = = p11 ps1 =1 p10 ps1 = 0 + p11 ps1 =1 ps2 =1 (9) Next, consider the probability of observing state 3. The probability of observing a zero or a one for the third state can be similarly determined, as outlined below. p ps3 = 00 p10 p01 p11 2 ps1 = 0 ( p00 2 + p01 p10 ) ps = 0 + ( p00 p01 + p01 p11 ) ps =1 ps3 = 0 1 1 = = ps1 =1 ( p10 p00 + p11 p10 ) ps1 = 0 + ( p10 p01 + p112 ) ps1 =1 ps3 =1 Likewise, the probability vector for state 4 or even state 194 and so on can continue to be determined by raising the transition matrix to greater and greater exponents and then multiplying by the vector of probabilities of the initial state. Consequently, as the number of iterations increases, the complexity of the calculation also increases, but more importantly, the reliance on the initial state 35 decreases and the resulting probability matrix depends more on the transitional matrix (Smith, 2003). As the number of iterations in the chain gets increasingly large, the values in the matrix eventually converges to a single matrix where every row is identical and the values of that row represent the posterior distribution. This posterior distribution is then used, along with a random number generator, to determine the next state from a current state. A random number between zero and unity is generated from a uniform distribution and this value is compared with the values of the probabilities in the posterior distribution to determine the next state. There is an array of algorithms for this process, and further details are presented in the following section. The idea here can be extended to variables that can take on polytomous observations, rather than just zeros and ones as illustrated above. MCMC estimation can even be extended to continuous rather than discrete variables. In the continuous case, a transition kernel is used instead of a transition matrix (Smith, 2003); a transition kernel is in the form of a function rather than a matrix. The estimation is referred to as MCMC because the property of a sufficiently long Markov Chain is that the values will converge to a singe matrix. The Monte Carlo element of MCMC estimation describes how the iterations of the chain are randomly selected from the distribution of possible states, which is represented by the rows in the matrix. For more information on MCMC methods and MCMC estimation, please see Smith, 2003. 36 The Metropolis Hastings Algorithm. A variety of algorithms for implementing MCMC estimation are available, and the oldest and most widely used is the Metropolis Hastings (Tierney, 1997). The Metropolis Hastings (M-H) algorithm was conceived by Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller in 1953, and was then generalized by Hastings in 1970 (Chib and Greenberg, 1995). As mentioned earlier, the transitional probabilities are used in combination with a random number generator to determine the mathematical state of the next iteration in the MCMC chain. The M-H algorithm uses the transition kernel (or matrix for the discrete case) to produce a candidate for the next state, rather than automatically accepting the state determined by the above mentioned process. The added precaution of examining this state as a candidate is needed because it is possible that there is a propensity to move from one state to another more frequently than vice versa (i.e. the transition from state A to state B occurs more often then from state B to state A), and this lack of reversibility must be taken into account (Chib and Greenberg, 1995). So, a comparison is made to reduce the moves from state A to state B to balance out an MCMC process that would otherwise be lop-sided and thus not satisfy the required condition of reversibility. This comparative step can be thought of as a filter to always accept moves from state B to state A, but to only accept moves from state A to state B with a certain (non-zero) level of probability (Chib and Greenberg, 1995). 37 Gibbs Sampling. Gibbs sampling is a special case of the M-H algorithm, which was introduced to the statistical community by Gelfand and Smith in 1990 as an approach for fitting statistical models (Gelfand, 1997). When estimation is required for multiple parameters, Gibbs sampling helps simplify the process of sampling the candidates by allowing it to focus on one component at a time. In short, the process examines the candidate vector of parameters to be estimated and accepts the candidate with the applicable probability based on the first parameter (say for example the item parameter), then obtains the next candidate and accepts it with the same probability based on the second parameter of interest (say this time it is the examinee parameter). The advantage of Gibbs sampling is that it takes a divide-and-conquer approach to estimating the item and examinee parameters simultaneously (Patz and Junker, 1997, p. 7). This was a brief primer describing the estimation procedure used to calculate the fusion model parameters. For a more comprehensive description of the statistical techniques or backgrounds regarding the use of the M-H algorithm and/or Gibbs Sampling in the context of MCMC estimation, please see Chib and Greenberg, 1995. Computerized Adaptive Testing Wide scale assessment must include items able to gauge the abilities of a broad range of examinees, which unfortunately bores the high ability examinees by 38 asking too easy items and frustrates the lower ability examinees with questions that are too hard. Needless to say, this is not an extremely efficient approach to individualized measurement. Computerized adaptive testing (CAT) avoids this issue by honing in on each examinee s ability estimate and asking items that would be informative at his/her level and omitting items that would not be helpful in the estimation process. CAT retains the advantages of group administration while adopting the advantage of individualized assessment (Wainer, 2000). Other advantages of CAT include test security, individual test-takers working proficiently and at their own pace, the absence of physical answer sheets, the possibility of immediate feedback, the ease of administering experimental items, and the variability of item formats available in the computerized interface (Wainer, 2000). The history of adaptive testing precedes the ubiquity of the personal computer. In the early 1970 s new approaches such as flexilevel testing (Lord, 1971a) and multi-stage testing (Lord, 1971b) were being explored as means of directing a test s questions toward the ability level of the individual examinee and avoiding superfluous items. Lord s research in this area is often credited as pioneering the field of adaptive testing (van der Linden and Pashley, 2000). Upon the influx of inexpensive computing power, personal computers became capable of being programmed to administer a test the way an individual examiner would, that is to say, directing the line of questioning towards the examinees true ability level. CAT has infiltrated the testing arena throughout the country. Large-scale tests that have implemented a computer adaptive version include the Computerized Placement 39 test, the Graduate Record Exam, the Armed Services Vocational Aptitude Battery, as well as licensure exams for individuals in the medical industry (Meijer and Nering, 1999). The aim of CAT is to construct an optimal test for each examinee (Meijer and Nering, 1999), and as a result, different examinees respond to a different set of items. In order to consistently estimate ability levels across examinees different administrations of items, CAT employs the field of item response theory (IRT), an area discussed at the beginning of Chapter Two. CAT algorithms consist of four steps: (1) selecting an initial item, (2) estimating the examinee s ability level after the administration of each item, (3) continuing to select all subsequent items, and (4) ending the testing process (Reckase, 1989; Thissen and Mislevy, 2000). Consequently, CAT has a multitude of specialized foci, including item bank development, item selection procedures and ability estimation procedures, which in turn bring up additional issues such as test security and reliability (Meijer and Nering, 1999). The emphasis of this dissertation, however, is on an approach to optimized item selection. Item Selection An important element of CAT administration is determining which items should be presented next given the current estimate of the examinee s ability level. The reason item selection is an important process in CAT is because it allows item 40 administration to adaptively correspond to the examinee s ability estimate (Meijer and Nering, 1999). The two most common selection procedures to determine the best item to administer next are maximum information and maximum expected precision (Thissen and Mislevy, 2000). The maximum information approach selects an item that maximizes the Fisher information function, presented in Equation (10), for the given ability estimate j : Ii j = ( ) [Pi ( j )]2 Pi ( j )[1 Pi ( j )] (10) where Pi j is the probability of a correct response by examinee j on item i given ( ) the current ability estimate j , and Pi j is the first derivative thereof. The method ( ) of maximum expected precision is a Bayesian approach that seeks to minimize the variance of the posterior distribution. Other possible item selection procedures include Owen s (1975) Bayesian item selection, Veerkamp and Berger s (1997) weighted information criteria, and Davey and Parshall s (1995) posterior weighted information. Other approaches based on variations of the maximum information procedure use Kullback-Leibler (K-L) information rather than Fisher information in item selection (see Chang and Ying, 1996; Eggen, 1999; Xu, et al., 2003). More details regarding K-L information are provided below. Item selection in this research entails the maximum information approach using both Fisher information and K-L information, as well as another information approach, Shannon Entropy. 41 While utilizing an item selection method that provides the most information about the given examinee is important, item selection does not solely depend on which item is the best in terms of precision of ability estimation. Theoretically, the next item selected is the one that can best aid in the ability estimation, but in reality, practical considerations must also be taken into account, such as content balancing and items exposure control (Wainer and Mislevy, 2000). In the context of highstakes CAT, there are three competing goals in a test s construction: (1) Select items that measure the examinee s ability as quickly, efficiently and accurately as possible. (2) Ensure each administration of the test measures the same combination of content domains and item formats. (3) Maintain the test security by controlling the rates of each item s exposure across all administrations (Davey and Parshall, 1995; Stocking and Lewis, 2000). Generally, satisfying one of these goals imposes a diminution of the other two (Stocking and Lewis, 2000). Continuously administering computer adaptive tests allows individual examinees to share what they remember about the test items with any peers that will take the test in a later administration. This compromises the integrity of the item pool (Way, 1998). If items become compromised, they no longer accurately measure all examinees abilities equitably. As a result, it is appropriate to administer items that are sub-optimal for a given ability estimate when the optimal item happens to be selected so frequently that it may be compromised. 42 Therefore, the precision of measure is sometimes sacrificed for the sake of item security. But in the end, the goal is to obtain an accurate estimate of the examinee s ability with a finite number of items and in a short enough amount of time to minimize the occurrence of fatigue. Methods for controlling items exposure rates have been developed by McBride and Martin (1983), Sympson and Hetter (1985), Kingsbury & Zara (1989), Davey and Parshall (1995), Stocking and Lewis (1998), and Chang and Ying (1999), just to name a few. There are three main types of approaches to controlling item exposure: randomization, conditional, and stratification. Randomization is when a random element is incorporated into the item selection process to reduce the exposure rate of otherwise frequently selected items. An example of this approach to exposure control is McBride and Martin s 5-4-3-2-1 procedure (1983) where the CAT program selects the five best items for the current ability level and randomly selects which of the five to administer. Then the four best items for the new ability estimate are chosen and one of them is randomly selected to be administered next. Then the three best items are selected based on the new ability estimate and one is randomly selected as the next item to administer. Then the next item is selected from the two best items for the current ability level and for the rest of the exam, the single best item is the next selected item. This is considered a randomized item selection procedure because of the random element in selecting the item from a group of plausible items. 43 Another possible type of item exposure control is a conditional approach. An example of this type is the Sympson and Hetter procedure (1985). This is considered a conditional procedure because a randomly generated number is compared with a calculated parameter, and if the random number is less than said parameter, then the item is administered. Hence, item administration is conditional on the value of the random number and the comparison parameter. An advantage of this method is that the test administrator can assign a maximum acceptable item exposure rate, unlike other types of item exposure control. The third approach to item exposure control includes the stratification procedures, such as the a-stratified procedure developed by Chang and Ying (1999). In this technique, the items are stratified by the item discrimination parameter, a. So items with similar values of the item discrimination parameter are grouped in the same strata. Then, items with lower discrimination parameter values are administered earlier in the exam, and items with the higher item discrimination parameter values are administered later in the test. Clearly, the issue of item exposure control is important in the context of CAT. Thus, accurately approximating individuals ability levels while maintaining item exposure control as well as content balancing are significant goals in CAT administration. Ergo, an item selection procedure is needed that can simultaneously manage all of such issues. Fortunately, the shadow test approach is capable of handling this challenging endeavor. The shadow test technique can take into account 44 item exposure control along with content balancing and a plethora of other constraints in the item selection process. The Shadow Test The idea of shadow testing was proposed by van der Linden and Reese in 1998. Shadow testing is a mode for test assembly that utilizes linear programming (LP) to incorporate constraints into the assembly process. It is an iterative process in which an ideal shadow test is formed before the administration of each item in an examination. A shadow test is a test that is not administered in its entirety to the examinee. Rather, a new shadow test is constructed before the administration of each item. Each shadow test has the same number of items as the entire test. Also, the last shadow test is the administered test in its entirety. A shadow test must be optimal at the given estimate level while complying with all of the specified constraints, and therefore any item selected from the shadow test will maintain such properties and be a good item for the current estimate level (van der Linden and Chang, 2003). Each shadow test must also contain any items already administered in the overall test. The best item on the shadow test (that has not yet been administered) is then selected as the next item to be administered to the examinee, and unused items are returned to the pool. The response from this item is then used in the process of formulating the next shadow test. The iterative process of the shadow test for a four-item test is illustrated in Figure 4. 45 Construct Shadow Test item * item * item * item * Administer Item Includes item 1 item * item * item * Includes item 1 Includes item 2 item * item * Includes item 1 Includes item 2 Includes item 3 item * Figure 4: The iterative process of the shadow test approach. The notation item * represents an item selected to be in the shadow test that has not already been administered; it is the best item among these that is selected to be the next item given to the examinee. At each step, a full-length shadow test is constructed, each of which preserve three important requirements. First, each shadow test meets all of the constraints; second, it must include all previously administered items; third, it has maximum information at the current ability estimate (van der Linden, 2000b). The last shadow test is the actual adaptive test and always meets all constraints, (van der Linden, 2000b, p.33). 46 The application of this approach results in two major advantages. First, the items actually administered in the adaptive test will certainly follow the constraints because each of the shadow tests meets these specifications. Second, the adaptive test will converge optimally to the true value of the estimator because the shadow tests are assembled to be optimal for the current estimate level, and in turn, each selected item is the optimal one from that shadow test (van der Linden and Chang, 2003). A shadow test is constructed based on an objective function and a series of constraints. Frequently, the objective function involves maximizing the Fisher information function (van der Linden and Reese, 1998; van der Linden, 2000b; van der Linden and Chang, 2003). With respect to the constraints, test-makers have a wide variety from which to choose (for an extended list, see van der Linden and Reese, 1998, van der Linden, 2000a or van der Linden, 2000b). There are three main types of constraints, including (1) those dealing with categorical item characteristics, (2) those concerning quantitative features of the items, and (3) those for inter-item dependencies (van der Linden, 2000a; Veldkamp and van der Linden, 2000). The test s administrators may choose any combination of these to include in the list of constraints. Some examples of typical constraints are presented below. For a fixed length test, the total number of items would be a necessary constraint to include, and for the context of diagnostic assessment, the number of items measuring each attribute would also be an appropriate constraint. The manual for the Arpeggio software (Stout et al, 2002) requires each attribute be measured by 47 at least three items, so this would need to be mathematically specified in the list of constraints (Hartz, Roussos and Stout, 2002). Another important constraint deals with content balancing. Each examinee should receive the same number of items in each content area (Green, et al., 1984) and this can be easily taken into account by a mathematical constraint. Item exposure control is another constraint that is beneficial to the test construction process. A constraint dealing with exposure control can be included by merely limiting the frequency of an item s administration (van der Linden and Reese, 1998) or as involved as using the alpha-stratified approach (van der Linden and Chang, 2003). Theoretically, any aspect of a nonadaptive test can be incorporated into the shadow test procedure as long as it can be mathematically represented by a constraint (van der Linden, 2000b). The mathematical constraints involved in this study are listed in Chapter Three. For more information regarding possible constraints, see Stocking and Swanson (1993) or van der Linden and Reese (1998). In order to formulate a shadow test that is optimal with respect to the objective function and that also obeys the specified constraints, the field of linear programming (specifically, integer programming) must be utilized. A computer software program called CPLEX (ILOG, 2003) is used to solve the linear programming problem of finding the best solution to the objective function under the given constraints. Typically, the solution is found using the branch-and-bound method (van der Linden, 2000b). (Please note that because CPLEX is a proprietary 48 program and hence the source code is not available to the public, there is no way to guarantee that this is indeed the way that CPLEX finds the solutions.) The branch-and-bound method is most easily understood in graphical form. Figure 5 illustrates the branch-and-bound method for the simple case of selecting two items from four possible items. The four possible items are represented in parentheses as follows: (item one, item two, item three, item four). A one indicates that the item will be selected and a zero indicates that it will not. In this example, two of the four items are to be selected, so ultimately there shall be two ones and two zeros representing which two of the four items have been selected. An asterisk indicates that a decision has not yet been made about that item. No Items selected yet. (*, *, *, *) no (0, *, *, *) yes (0, 1, *, *) no yes yes (1, *, *, *) no (1, 0, *, *) no ye s (1,0, 1, *) Select Item 1? Select Item 2? no yes (1, 1, *, *) no yes Select Item 3? no (0, 0, *, *) yes (0, 0, 0, *) (0, 0, 1, *) (0, 1, 0, *) (0, 1, 1, *) (1, 0, 0, *) (1, 1, 0, *) (1, 1, 1, *) (0, 0, 1, 1) (0, 1, 0, 1) (0, 1, 1, 0) (1, 0, 0, 1) (1,0, 1, 0) (1, 1, 0, 0) Figure 5: The branch-and-bound method. 49 The first decision is whether or not to include the first item or not, resulting in two possible branches. Each of these branches splits into two more branches to decide whether to include the second item or not. For example, (0, 1, *, *) indicated the first item is not selected, the second item is selected, and a decision has not yet been made about the remaining two items. This branching continues until all of the possible items have been considered. When a particular branch includes a set of items that do not obey the specified constraints, then that branch is no longer explored, that is to say, it is bound back and not allowed to grow any more. For instance, the branch (0, 0, 0, *) and (1, 1, 1, *) were both bound because they would not have yielded a total of two selected items. These types of problems are often referred to as knapsack problems because they can be thought of in terms of trying to select certain objects to be put into a knapsack for a trip. The packer wants to select the best possible combinations of things to put in the knapsack. Most likely, one would select just a few objects from a variety of categories when packing the knapsack. In this analogy, the knapsack is the shadow test and all of the possible objects to choose from are contained in the item bank. The content categories of the items may be thought of as the different categories of the objects to be including. Any time a branch in the algorithm is encountered that does not correspond with all of the requirements in the list of constraints, that part of the branch is abandoned. Once all of the possible combinations of items which obey the list of constraint are determined, the objective function is used to select the best combination. For example, if the objective 50 function is to maximize Fisher information, then the Fisher information is calculated for every combination of items that obey all of the constraints, and the combination with the greatest Fisher information is selected as the best combination of items. This winning combination then becomes the Shadow Test. For more information about linear programming or the branch-and-bound method, see Bertsimas and Tsitsiklis (1997) or Hawkins (1988). Once a shadow test is assembled, the best item with respect to the attribute vector estimate is selected from the shadow test to be the next item administered to the examinee. Two possible strategies, Shannon Entropy and Kullback-Leibler Information, can be employed for the selection process as described in Xu, Chang, and Douglas (2003). Shannon Entropy. Shannon Entropy was introduced in 1948 as a measure of uncertainty from a probability standpoint (Harris, 1988). Shannon Entropy is an indicator of a random variable s uncertainty or disorder. It is a nonnegative concave function of the random variable s probability distribution. In the context of this paper, the goal is to minimize Shannon Entropy; that is to say, it is more desirable to have less uncertainty. Shannon Entropy is described by Equation (11): 1 Sh ( ) = i log i i=1 K (11) where i is the probability that the random variable of interest, call it Y, takes on a particular value yi, and is the probability vector containing the i s for all possible 51 values of yi (Xu, et al., 2003). In the context of diagnostic assessment, where we are interested in estimating attribute vectors, the function for Shannon Entropy becomes Equation 12, as described by Xu, et al. (2003): 1 Sh ( n , X i ) = E n ( n X i = x ) P [ X i = x n 1 ] x= 0 (12) = E n ( n X i = x ) x= 0 1 2M Pix ( c )[1 Pi ( c )] c=1 1 x n 1 ( c ) where Xi is an item in the bank, c is the possible candidate attribute vector generated by the ith item, and n 1 is the posterior probability distribution of a candidate pattern after items have been administered. n-1 Kullback-Leibler Information. Kullback-Leibler (K-L) information was introduced in 1951 as a distance measure between probability distributions (Kullback, 1988). It is used as a measure of discrimination between a true distribution and another distribution (Kullback, 1988). More recently, K-L information has be used as a measure of global information for the purpose of item selection in IRT (Chang and Ying, 1996) and as an index in the item selection process in diagnostic assessment (Xu, et al., 2003). The definition of K-L information for continuous probability distributions is given by Equation (13). K ( f ,g) = log g( x ) f ( x ) (dx ) f ( x) (13) 52 For the cognitive diagnosis context, we want to use K-L information as an item selection criterion. The integral becomes a sum when the variables are discrete; then the sum is taken across all possible attribute patterns. Thus the function becomes: 1 P(X = x ) i ) = log ) K i ( P ( X i = x P(X i = x c ) c=1 x= 0 2M (14) where is the currant estimate for the attribute vector and c is the possible candidate attribute vector generated by the ith item (Xu, et al., 2003). This yields an information index relating our current attribute vector estimate with the possible attribute vector estimate resulting from the administration of the next item i for every possible remaining item. The item with the largest value of K i ( ) is then selected as the next item. K-L information is beneficial in the context of cognitively diagnostic assessment because it easily lends itself to the categorical case, unlike alternative forms of information, such as Fisher information. Recall from Equation (11) that Fisher information requires a derivative function, which does not exist for a discrete random variable. The main concern of a computationally intensive selection procedure like KL information is an increase in computation time. Because examinees are waiting for the item selection procedure during administration, it is imperative that the procedure occurs in a timely fashion. Cheng and Liou (2000) compared the item 53 selection procedures involving maximizing Fisher information and K-L information and noted that K-L information took longer than Fisher information based algorithms, especially as the size of the item bank increases. Although slower than Fisher information-based algorithms, the use of the K-L information procedure for item selection was still quite fast. In Cheng and Liou s (2000) study, item selection using K-L information took a quarter of a second on a Pentium II 266 MHz PC. When Eggers (1999) and Chen, Ankenmann, and Chang (2000) compared the use of K-L information with Fisher information in the context of item selection, they found that K-L information performed as well as Fisher information, if not better for less than ten items. For more than ten items, the results of both methods were the same. In this study, the minimization of Shannon Entropy and the maximization of K-L information are each used to select the best item from the Shadow Test with respect to the attribute vector. The procedure of this study is explained in the following chapter. 54 CHAPTER THREE: METHODOLOGY The research design is comprised of two major sections. The first part deals with the application of the fusion model in analyzing examinees results on an existing exam in order to provide diagnostic feedback from an exam that would otherwise provide none. The rationale and advantages of this have been previously explained. The second major segment deals with incorporating computerized adaptive testing technologies to adaptively assess examinees from a diagnostic standpoint. This section compares three procedures for administering a computerized adaptive diagnostic assessment, one of which employs the method of shadow testing to select items based on optimal estimates of both the attributes and the conventional IRT ability parameter simultaneously. PART ONE: Analyzing an Existing Test in a Cognitive Diagnosis Framework This study applies the Fusion model to a pair of assessments required by the Texas Education Agency (TEA). The first dataset of interest involves results from the third grade Texas Assessment of Academic Skills (TAAS) administered in the spring of 2002. The second dataset is from the spring 2003 administration of the eleventh grade Texas Assessment of Knowledge and Skills (TAKS). (Please note that the TAKS is administered under unmotivated testing conditions.) Both the third grade test contains a reading section and a math section. The eleventh grade test contains a math portion and a Language Arts portion. However, the Language Arts portion is referred to as a reading portion in this dissertation for the sake of 55 consistency in nomenclature across the two grade levels tests, but please note that in actuality, the eleventh grade reading portion is really a Language Arts assessment. Each of these sections is analyzed separately. Psychometric issues such as reliability and validity are perused very carefully by the Texas Education Agency to ensure a sound assessment. Most of the Kuder-Richardson Twenty reliability coefficients lie in the high 0.80 s to low 0.90 s range (Texas Education Agency, 2002). In addition, evidence of content-, construct-, and criterion-related validity was explored. More detailed information regarding reliability and validity, including specific estimates and methods, as well as a plethora of other test characteristics are included in the Texas Student Assessment Program Technical Digest for each academic year (Texas Education Agency, 2002). The first step is to obtain a simple random sample of two thousand examinees from the approximately three hundred thousand students who took each exam. The next requisite for conducting a diagnostic analysis is to obtain a Q-matrix to represent the attributes measured by the items. Three Q-matrices were developed for both portions of the eleventh grade TAKS and third grade TAAS. One of the Qmatrices is based on the test blueprint provided by TEA, the second is based on the attribute assignments provided by content experts at TEA, and the remaining Qmatrix is based on an intuitive evaluation of the test items by the author. These three approaches to Q-matrix construction reflect different possible methods a psychometrician might adopt if s/he desired to conduct a cognitively diagnostic 56 analysis. The number of items and attributes for each of these situations are presented below in tabular form. Table 1: Number of Attributes in each Q-Matrix Math Blueprint Grade 3 Grade 11 11 10 Enhanced 12 16 Intuitive 11 10 Blueprint 6 3 Reading Enhanced 6 8 Intuitive 7 5 The 3rd test grade has 44 items in the math portion and 36 items in the reading portion. The eleventh grade test contains 60 items in the math portion and 28 items in the reading portion. The analyses are then performed using the fusion model software program Arpeggio (Hartz et. al, 2002). This procedure involves three major steps. First, the Arpeggio program is executed. This program estimates the fusion model parameters based on the item response data using four thousand burn-off cycles in each of four MCMC chains. Then the estimates for the mastery proportions (the parameter denoted as pk ) for each of the measured attributes are examined. It is important these be accurately estimated with a given level of confidence because they are used as the cut-off values for the attribute mastery assignments. When these parameters have been determined to be estimable within a certain acceptable range (the ninety57 five percent confidence interval width must be no greater than three quarters of the overall possible range for the parameter), a stepwise reduction procedure is used to update the Q-matrix to be more useful. This stepwise reduction procedure examines the upper confidence interval bound of the attribute-based item discrimination * parameter rik . If this value is too high (i.e. above 0.9), then the corresponding Q- matrix entry is removed on the rationale that a high Q-based item discrimination parameter indicates that the designated attribute is not important in obtaining a * correct response. (Recall that a low rik is desirable.) If any of the fusion model item parameters are determined to not be estimable, additional steps are implemented which include the execution of a procedure designed for dealing with this issue. More information on the stepwise reduction algorithm may be found in Hartz et al (2002). Lastly, additional analyses associated with the Arpeggio software are used to evaluate the model s fit in the analysis. This additional software flags examinees with a low probability of obtaining mastery status who have achieved mastery status and also flags examinees with a high probability of obtaining mastery status who have been assigned non-mastery status. Ideally, the total number of flagged examinees would remain minimal. More specific details on the process involved in using the Arpeggio program and the additional programs for evaluation fit can be found in Hartz (2002) or Hartz et al (2002). 58 The results of the analyses may then be examined to determine the appropriateness of the application of the fusion model to this data. The effectiveness of this model is taken into consideration by examining the proportion of the examinees that are accurately classified. This portion of the study s design compares the effects of using different Qmatrices in the cognitively diagnostic analysis. This highlights an important issue because of the subjective nature of Q-matrix development, which is a fundamental step in the process of diagnostic assessment. The use of a variety of Q-matrices also addressed the fact that a Q-matrix with simple structure (i.e. each item only measures one attribute) may have different results than the same analyses based on a Q-matrix with complex structure (i.e. the items may measure multiple attributes). The two tests chosen for the analysis exhibit varying levels of complexity; this incorporates a range of possible realistic circumstances. This portion of the research provides information dealing with the application of a cognitive diagnostic model to an existing mandatory standardized assessment. Some possible future research directions might be to include the rule space methodology in such an analysis or to cross-validate the resulting attribute vectors of the examinees, perhaps by means of interviewing a portion of the sampled students or their teachers. 59 PART TWO: Adaptively Enhancing the Assessment Process This part of the research study explores a method for selecting items in a computerized adaptive assessment based on a cognitive diagnosis framework. This portion of the project utilizes a simulation of a computerized adaptive assessment. Computer simulation is an important tool in measurement research because it allows researchers to simulate the testing process with a set of known data. The description of this part of the research first elaborates on the procedure for obtaining the data and item parameters involved, and then expounds on the three different cognitive diagnosis-based approaches to the item selection procedure. Data and Parameters This study is based on real responses to items administered by the Texas Education Agency. The item response patterns of a simple random sample of twothousand examinees from each of three administrations of the third grade TAAS (years 2000 through 2002) are analyzed using BILOG-MG to obtain the three parameter logistic model (3PL) item and ability parameters. This is conducted for both the math and reading portions of the test separately. The response patterns are also analyzed using the Arpeggio program (Stout, et al., 2002) to calibrate the fusion model parameters, including item, person, and attribute parameters. Each combination of content portion and Q-matrix type is analyzed as a separate condition, resulting in 2 x 3 conditions. These analyses lead to a six sets of values 60 describing mastery levels of the measured attributes as well as a traditional IRT ability estimate for each examinee for math and reading. Please notice that there are only two sets of 3PL item parameters (one for reading and one for math), while there are six sets of the fusion model parameters. This is because the Q-matrix construction does not matter in a 3PL analysis. Arpeggio estimates the item parameters and j values by means of MCMC estimation, using four Markov chains with a length of four thousand each and one thousand burn-off cycles. All other details of the analysis correspond with the default settings described in the software s manual by Hartz et al (2002). The mastery/non-mastery assignments of each of the attributes for the two thousand examinees are determined by dichotomizing the examinee s values in the j . A value of 0.5 is recommended as a cut-off criterion for determining masterynon-mastery status. An examinee s mastery level for a specific attribute takes on a value of one (indicating the given examinee has mastered the attribute) if jk is greater than or equal to the pk value for the attribute, and a zero (indicating nonmastery) otherwise. The analysis using Arpeggio also revises the Q-matrix for the items by means of the stepwise reduction procedure described in Part One of this chapter. 61 The values of the ability parameter obtained through the BILOG-MG calibration will be treated as the known or true values and will be denoted as 0 . The final estimates of the ability parameter will be compared with these values. Likewise, the dichotomous ability parameters obtained from the Arpeggio analysis will be considered the known or true attribute mastery patterns and will be represent by 0 . The attributes measured by the tests are listed in Appendix A. Recall from chapter two that CAT simulations require an item bank from which to select items. The item parameters from the three administrations (from both the 3PL and fusion model) will be tripled in order to build a larger item bank for the CAT simulation. This results in an item bank of 396 for the math portion and 324 for the reading portion. Once the 3PL item parameters and revised Q-matrix entries are obtained for each item, the process of CAT simulation may begin. Figure 6 outlines how the item parameters are obtained. The correctness of the item responses with in the CAT administration are obtained by comparing the probability of a correct response with a random number between zero and one. For The first and third conditions, the response probability is obtained from the 3PL model (Equation 3) and for the second and third conditions, the response probability is based on the fusion model (Equation 8). The third condition is the only one that utilizes both models in determining the correctness of the simulees item responses (as depicted later in Figure 9). 62 Response Patterns obtained from TEA Q-matrix entries Analyze item responses using BILOG-MG to obtain traditional 3PL IRT item parameters. Analyze item responses using Arpeggio to obtain the fusion model parameters. 3PL item and person parameters Revised Q-matrix entries Fusion model item and person parameters Computer Adaptive Test Simulation Figure 6: Using item response patterns to obtain item and person parameters. The study s design for the simulation portion includes three conditions for comparison. Each condition reflects a different approach to item selection in adaptive testing. One focuses on IRT-based ability levels only, another on attribute mastery vectors only, and the third on both. The results are then compared with the 63 known values of the examinees ability levels and attribute vectors obtained from the observed response data as described above to evaluate the accuracy of the three conditions. Research Design The study s design includes three conditions for reflecting three possible items selection methods. One selects items based on the traditional unidimensional IRT-based ability estimate only, another selects items based on the cognitive diagnosis-based attribute vector estimate only, and the last selects items based on both. The first condition implements the conventional method of focusing solely on during item selection. The second condition, focusing solely on , mimics the approach outlined in Xu, et al. (2003) to select items. The third condition uses the estimates of both and to select the best item to be administered next in the test. For all three conditions, the goal is to estimate both traditional IRT ability estimates ( ) as well as diagnostic attribute mastery levels ( ) for each examinee. The difference between the conditions is in the item selection procedure of the adaptive assessment. The third condition is the heart of this study. It first involves the construction of a shadow test that is optimized according to the ability level (as outlined by van der Linden and Reese, 1998) before the administration of each item. Then, the best 64 item for measuring the attribute vector is selected from the shadow test based on the current estimate using Shannon Entropy or Kullback-Leibler Information as outlined in Xu, et al. (2003). This is the only condition involving the shadow test approach. The first two conditions reflect methods that have already been implemented in previous research. This third method hold the unique contribution of this research study by focusing on both traditional IRT ability estimation as well as the cognitively diagnostic attribute information. Condition 1: Theta-based Item Selection. The first condition uses the conventional method for item selection, which focuses solely on the theta estimate during item selection. This condition includes two sub-conditions dealing with different approaches to item selection. The item to be administered next is determined as the item with maximum Fisher information given the current estimate of in one sub-condition. In the other sub-condition, the item with the maximum K-L information will be selected as the next item. (Please note: Maximizing K-L information based on the continuous theta variable requires an interval to integrate over, denoted as n . This interval is calculated as 3/ n , where n is the number of items administered in the test. Please see Chang and Ying (1996) for more detailed information regarding the integration or this n interval.) 65 Also, The content balancing procedure proposed by Kingsbury and Zara (1989) will be implemented in this condition. Once all of the items are administered, and the final ability estimate is obtained, the individual attribute vectors are estimated using the maximum likelihood estimation procedure. Condition 2: Alpha-based Item Selection. The second condition takes the cognitive attribute vector into account in the item selection procedure. In this condition, an item is selected when it is the best item for the current estimate of the attribute mastery vector for the given examinee. To do so, this condition mimics the approach outlined in Xu, et al. (2003), which uses Shannon Entropy or Kullback-Liebler Information to determine the best item for a given estimate. This study includes both item selection methods of minimizing Shannon Entropy or maximizing K-L information as subconditions of condition 2. After all items have been administered and the final estimate of the attribute vector has been obtained, the values of the ability estimates are calculated from the individual response patterns using the maximum likelihood estimation procedure. 66 Condition 3: Theta- and Alpha-based Item Selection. Condition 3 is the part of the project that utilizes the shadow testing approach. The aim of this condition is to simultaneously use estimates and estimates to select items in a computerized adaptive testing administration. This third condition has two sequential sections. First, a shadow test is constructed that is optimized with respect to the current estimate of . This ensures that whichever item is selected to be administered to the examinee is optimal for his/her current estimate of . Notice the reliance on the current estimate of is this procedure. Before the first shadow test can be constructed, an initial current estimate of and is required. Generally, the mean value is used as the initial estimate; this is reasonable because it is less likely that a given examinee s true ability level is going to lie in the remote extremes of possible ability values than near the mean (Thissen and Mislevy, 2000). So, the mean of the population of examinees ability levels is a suitable initial approximation for an examinee s ability estimate (Thissen and Mislevy, 2000), and likewise the mean of the estimates for the attribute vectors is also a sensible initial estimate for a given examinee s attribute mastery vector estimate. However, the mean of a set of dichotomous variables is not very meaningful considering the estimates of the attribute mastery levels must also be dichotomous. Therefore, a more fitting initial estimate of would be the most frequently occurring attribute mastery pattern, or in other words, the mode. 67 Once the initial estimates of and have been ascertained, a shadow test is assembled that is optimal at the mean value of the theta ability level. Naturally, this will be the same initial shadow test for all examinees. As previously mentioned, a shadow test is constructed form an objective function and a list of constraints by means of the branch-and-bound method using the software program CPLEX (ILOG, 2003). The objective function for forming the shadow test follows the conventional maximization of Fisher information, mathematically denoted as I I ( ) x i i =1 I i . The applicable constraints are listed below. x i =1 i =n for i = 0, 1, 2, 3, , I (I= the # of items in the pool) and a total test length of n. This regulates the test length. x q i i =1 I ia < or > Const a for a = 0, 1, 2, 3, , A (A= the # of attributes.) This ensures there are a specified number of items measuring each attribute. i Vg x i = ng where Vg is a set of items that belong to category g and g=1 G. The value of G is the number of content categories, and g is a content area like geometry or reading comprehension. This allows that a certain number of items be administered for each content category. i ck 1 x i = k 1 where ck-1=the set of items already given. This makes sure the shadow test includes all previously administered items. 68 Other possible constraints could deal with the type of item administration or the item duration involved in answering each item. Recall the original goal of a shadow test is not only to include a set of items that are optimal for a given ability estimate level, but also to ensure these items obey content balancing and any other constraint the test administrator requires. After inputting this information into CPLEX and completing the analysis, an output file presents the list of the items to be included in the shadow test. The purpose of constructing a shadow test in this manner is to obtain a set of items that are all good items for the current estimate of that also obey the assigned constraints. Therefore, no matter which item is selected, it is a good item from the viewpoint of traditional unidimensional IRT measurement. The second stage of the process involves selecting the next item to be administered to the examinee from this shadow test. This is the stage that takes into account a cognitive diagnosis component during the item selection process. Selecting an item from the shadow test is based on appraising each item s worth in contributing to the estimate of the examinee s vector. The additional information regarding the cognitive attribute vector provided by this approach can be considered supplemental to the conventional method of looking solely at the unidimensional IRT ability level because all items in the shadow test are already optimal with regard to the estimate, . So, any diagnostic information provided by the estimation of the attribute vector may be considered an extra bonus. Applicable methods for selecting 69 items from the shadow test based on diagnostic information include minimizing Shannon Entropy and maximizing K-L Information. Just as in condition 2, each of these methods is considered a sub-condition within condition 3. A computer program will be written to calculate the Shannon Entropy of the items selected by CPLEX for the shadow test and determine the smallest value among them. Likewise another program will be composed to calculate the K-L information for the set of items selected by CPLEX and to single out the greatest value among these. The item selected by this program after the construction of each shadow test will be the item to be administered next. The items in the shadow test that are not selected are returned to the item pool. Once an item is selected, the correctness of the response is simulated by comparing the probability of obtaining a correct response with a number randomly drawn from the uniform distribution between zero and unity. The probability is calculated from the 3PL model given the examinee s ability level. If the random number is greater than the probability of obtaining a correct response, then the item is scored as incorrect; otherwise the response is designated as correct. New estimates of j and j for examinee j are then calculated given the responses to all previously administered items using the maximum likelihood estimation procedure for each. This cycle of administering selected items and updating the estimates of and from the simulated responses is repeated again and again until the desired test length has been administered. The computer adaptive 70 testing simulation procedure for condition 3 is illustrated in Figure 7. While each examinee s response to every item in the bank is not determined beforehand, as is typical in simulation studies, the individual response patterns are saved and available for future research. Construct a shadow test optimal at . Current estimate of j Select the best item from the shadow test that is optimal at . Current estimate of j Repeat until all the items have been administered. Administer the selected item and determine if the response is correct or incorrect. Known value of j Update the estimates of j and j . Figure 7: Computerized adaptive testing simulation process for condition 3. Upon completion of the simulation study, the final estimates of and are compared with the known values, 0 and 0 , obtained from the initial calibration. 71 This comparison is then used to evaluate the various methods and conditions. The various conditions of this study are depicted graphically in Figure 8. For all three conditions, the fusion model-based item parameters are estimated beforehand by MCMC using Arpeggio and the 3PL item parameters are estimated beforehand with MLE using Bilog, as described on pages 62 through 64. For all three conditions, McBride and Martin s (1983) 5-4-3-2-1 procedure discussed in the previous chapter will be used for item exposure control. 72 Condition 1: Select items based on traditional IRT j estimates only. Estimate j values afterwards by MLE. Condition 2a: Item selection is based on maximizing K-L information. Condition 2b: Item selection is based on maximizing Fisher information. Condition 2: Select items based on cognitive j estimates only. Estimate values afterwards by MLE. j Condition 2a: Item selection is based on maximizing K-L information. Condition 2b: Item selection is based on minimizing Shannon Entropy. Condition 3: Select items based on both traditional IRT j and cognitive j estimates (obtained by MLE for both). Construct a shadow test optimal for the current j level. Select best item from the shadow test according to the current estimate of j . Condition 3a: Item selection from the shadow test is based on maximizing K-L information. Condition 3b: Item selection from the shadow test is based on minimizing Shannon Entropy. Figure 8: Visual representation of the three conditions. 73 Comparative Evaluation Results of the three conditions are to be evaluated in terms of the accuracy of both the attribute mastery level estimates and the traditional IRT ability estimates. Evaluation of the attribute vectors is conducted by comparing the final estimated values of j with the attribute mastery levels from the Arpeggio original analysis of the real data, j 0 , for each examinee j. Similarly, the final IRT ability estimates are compared with the corresponding true values of the ability parameter, 0 , obtained from the BILOG-MG maximum likelihood estimation (MLE) of the original dataset. If a given condition works well, then the final estimates of both parameters and should match the corresponding known parameters 0 and 0 respectively. Please note that the true values of 0 and 0 are unrelated to the results from the forst part of this study. Evaluation of estimation is made by calculation the correlation between the known value of the ability level 0 and the observed values of . Bias and the root mean squared error (RMSE) are also examined to evaluate the estimates. Conditional bias plots plots of the true values versus estimated values are also and included. With regard to attribute mastery estimation, the final estimates of j is compared with the attribute vectors provided by the Arpeggio analysis of the original real data by examining the hit rate of each attribute as well as the hit rate of the entire attribute pattern for each examinee. The item selection procedure(s) with the highest 74 correlations and hit rates will be considered superior to the remaining procedures. If the approach(es) with the highest correlation between known and final estimated values of the traditional IRT ability parameter does not happen to also have the best hit rates for the cognitive attribute vectors, then the methods will be evaluated with respect to each criterion individually. Like cells from conditions 1 and 3 are compared to each other just as like cells from conditions 2 and 3 are compared. In other words, results from a cell dealing with response strings generated from the 3PL model are only compared to results from another condition also resulting from response patterns based on the 3PL model. Likewise, results from a cell based on response patterns generated from the fusion model are only compared with other results produced from data generated by the fusion model. These distinctions are illustrated in Figure 9, where green indicates the correctness/incorrectness of examinee j s response to item i is generated from the 3PL model and blue indicates the correctness/incorrectness of examinee j s response to item i is generated from the fusion model. In this diagram, blue cells are only compared with other blue cells and not green cells; vice versa for the green cells. 75 Condition 1: -based selection Blueprint Q Intuitive Q 1a: K-L Information 1b: Fisher Information Blueprint Q Intuitive Q Math Reading Math Reading Condition 2: -based selection Blueprint Q Intuitive Q 2a: K-L Information 2b: Shannon Entropy Blueprint Q Intuitive Q Math Reading Math Reading Condition 3: - & - based selection Blueprint Q Intuitive Q 3a: K-L Information 3b: Shannon Entropy Blueprint Q Intuitive Q Math Reading Math Reading Figure 9: Visual representation of the research design. This results in 32 cells, each involving 3,000 simulees. Cells based on the 3PL model for response pattern generation will only be compared with other 3PLbased conditions, and likewise for the cells based on the fusion model for response pattern generation. 76 It is expected that condition 3, which selects items based on both the traditional IRT ability and the fusion model cognitive attributes will have the best estimates of both types of parameters. Condition 1 is expected to have good estimates for , but sub-optimal estimates for , while vice versa is expected for condition 2. The results will be presented in tabular form in subsequent chapters upon completion of the simulation. 77 CHAPTER FOUR: RESULTS As discussed in Chapter Three, this study is comprised of two major parts. The first is the analysis of a large-scale standardized test with the fusion model. This analysis is designed to demonstrate confidence in this model s ability to assess examinees in a cognitive diagnosis framework. It is imperative that the fusion model does a good job at assessing attribute mastery of examinees before a computer adaptive test administration can be utilized to enhance the assessment process. The first portion of this study examines the fusion model s success as a cognitively diagnostic assessment. The second part implements computer adaptive testing technology into cognitive diagnosis. The results from each of these two parts will now be presented. PART ONE: Analyzing an Existing Test in a Cognitive Diagnosis Framework An analysis based on the fusion model was conducted across varying grade levels, content areas, and Q-matrix types to ensure the fusion model is effective in a variety of testing situations. The different Q-matrix types also varied in complexity of their structure. Simple structure refers to a Q-matrix in which each item only measures one attribute. If an item measures more than one attribute, then the Qmatrix is referred to as having complex structure. For the third grade math test, the average number of attributes being measured by each item is presented in Table 2. 80 All four Q-matrices based on a test blueprint and the intuitive Q-matrix have simple structure. The remaining Q-matrices all have complex structure. Table 2: Average number of attributes per item, test and Q-matrix type. Math Test Blueprint Grade 3 Grade 11 1 1 Enhanced 2.34 1.32 Intuitive 1.91 1.35 Reading Test Blueprint Enhanced Intuitive 1 1 1.39 1.79 1 1.04 Another point of interest in a cognitive diagnosis-based analysis is the number of items measuring each attribute. It is more psychometrically sound to have more items measuring a given attribute than fewer items. The average number of items measuring each attribute is presented in Table 3. Table 3: Average number of items per attribute, test and Q-matrix type. Math Test Blueprint Grade 3 Grade 11 4 6 Enhanced Intuitive 8.5 4.94 7.55 8.2 Blueprint 6 9.33 Reading Test Enhanced Intuitive 8.5 6.25 5.14 5.8 81 A benefit of the fusion model is that in addition to evaluating examinees with regard to specific attributes, it includes parameters for evaluating the items facility in measuring attributes. In order to determine how well an item measures an attribute, the parameters * and r * can be examined. The means and standard deviations of the estimates of the * and r* parameters are presented in Tables 4 and 5, respectively. Table 4: Means and standard deviations of * estimates across tests and Q-matrix types. Math Test Blueprint Grade 3 0.857 (0.028) Grade 11 0.756 (0.037) Enhanced 0.861 (0.027) 0.763 (0.037) Intuitive 0.863 0.021) 0.761 (0.041) Blueprint 0.908 (0.021) 0.833 (0.020) Reading Test Enhanced 0.908 (0.015) 0.844 (0.031) Intuitive .910 (0.025) 0.838 (0.022) 82 Table 5: Means and standard deviations of r* estimates across tests and Q-matrix types. Math Test Blueprint Grade 3 0.5327 (0.047) Grade 11 0.488 (0.033) Enhanced 0.647 (.063) 0.551 (0.044) Intuitive 0.645 (0.055) .576 (0.045) Blueprint 0.592 (0.051) 0.535 (0.046) Reading Test Enhanced 0.656 (0.051) 0.616 (0.065) Intuitive 0.595 (0.055) 0.540 (0.049) Low values of r* s and high values of * s denote a good item. These parameters can be used to evaluate each item s performance in the assessment and poorly performing items can be removed from the analysis. Items with low * values may be too difficult and can be removed or reconsidered for inclusion in the test, and items with high r* values for a certain attribute perhaps do not really measuring that attribute. Values of r* should remain below 0.9 and the confidence intervals of pk should be less than 0.525 (Hartz et al, 2002). The estimated values of these parameters were all within acceptable ranges according to the Arpeggio software manual by Hartz et al (2002). The main output from a fusion model analysis advocated in this work is the cognitively diagnostic information provided by the estimated attribute mastery patterns for each examinee. Summary statistics of the proportion of examinees 83 obtaining mastery status averaged across the measured attributes is presented in the following table. Table 6: Means and standard deviations of proportions of examinees obtaining mastery status. Math Test Blueprint Grade 3 0.726 (0.057) Grade 11 0.397 (0.032) Enhanced Intuitive 0.826 (.066) 0.405 (0.033) 0.772 (0.077) .442 (0.093) Blueprint 0.800 (0.014) 0.729 (0.024) Reading Test Enhanced Intuitive 0.817 (0.017) 0.803 (0.071) 0.791 (0.019) 0.737 (0.035) These values can be used to estimate the difficulty of the different tests. Specifically, the lower average proportion of passing students shows that the math portion of the eleventh grade test is more difficult than the other tests. The individual values for each attribute could also be used by test administrators to determine which attributes are more difficult than others or if the difficulty level of the attributes is acceptable. The Arpeggio package includes additional software for flagging examinees with a low probability of obtaining mastery status who did indeed obtain mastery status and examinees with a high probability of obtaining mastery status who did not obtain it. Ideally the proportion of flagged examinees would be small (where small 84 can be arbitrarily defined by the test administrator). Table 7 shows the proportion of flagged examinees for each analysis. Table 7: Proportions of flagged examinees. Grade 3 Math Portion Reading Portion Blueprint 0.0510 0.0630 Enhanced 0.0735 0.0455 Intuitive 0.0585 0.0540 Grade 11 Math Portion Reading Portion Blueprint 0.0875 0.0550 Enhanced 0.0530 0.0340 Intuitive 0.0775 0.0400 In addition, the values of the proportion of examinees with attribute mastery estimates close to the cutoff value of 0.5, for instance between 0.4 and 0.6, can be examined to determine which specific attributes are tenuously measured. If an examinee s ability estimate for a specific attribute is close to 0.5, then it is difficult to be sure whether or not this person should be assigned mastery status for that attribute. Thus, a minimal number of such occurrences is preferred. The means and standard deviations of these proportions are presented in the following table. 85 Table 8: Means and standard deviations of proportions of estimates between 0.4 and 0.6. Math Test Blueprint Grade 3 0.056 (0.016) Grade 11 0.052 (0.010) Enhanced 0.068 (.021) 0.073 (0.018) Intuitive 0.071 (0.018) .088 (0.052) Blueprint 0.038 (0.006) 0.038 (0.002) Reading Test Enhanced 0.043 (0.014) 0.060 (0.016) Intuitive 0.039 (0.009) 0.049 (0.005) For all twelve conditions, the number of flagged examinees remained less than ten percent. The mean proportion of examinees with attribute mastery estimates close to the cut-off value of 0.5 is also less than 0.10 for each of the twelve conditions. The analyses included a variety of grade levels, content areas, and Qmatrix construction techniques and all of the analyses successfully provided the desired diagnostic information for most of the examinees as well as the additional evaluative information regarding the items. In sum, the application of the fusion model to existing state-mandated exam successfully provide the desired cognitively diagnostic feedback about individual examinees in a variety of testing situations. It then seems to be appropriate to extend such analysis to a computerized adaptive testing situation to improve the assessment process. 86 PART TWO: Adaptively Enhancing the Assessment Process The evaluation of the information obtained from the second part of this dissertation involves three areas of examination. First, it is important that the various methods accurately estimated the values of the single score, . Second, the methods should also accurately estimate the attribute mastery patterns. An acceptable method would estimate both the values and the attribute mastery patterns well. Third, the item exposure rates of the various methods were examined because item exposure control is an important issue in test security in computerized adaptive testing. The following sections deal with each of these issues. Single Score Estimation First, the estimates of the different conditions are of particular interest. The values are compared with the true values to determine how well each of the methods succeeds in accurately estimating the overall single score. The comparison between the true and its corresponding estimate is accomplished by examining the values of the correlation coefficient, the root mean square error, and the bias statistics. Approaches where the probability of obtaining a correct response is based on the 3PL model are grouped together. Likewise, approaches where the probabilities for obtaining a correct response are based on the fusion model are grouped together. 87 It would be invalid to compare results across these different models because the way the response patterns are generated is a fundamental aspect of the CAT simulation and differences due to model choice would be confounded with the differences due to the various methods and conditions. Non-convergent cases are removed form these analyses. The number of non-convergent cases for the conditions where response probabilities are based on the 3PL model are presented in Table 9. Table 10 shows the number of non-convergent cases for conditions where response probabilities are based on the fusion model. Non-convergence is determined by the estimate being assigned a 4 or 4. This is distinguished from convergent estimates because the convergent estimates take on values with several decimal places, but the default setting in the estimation process is to identify non-convergent cases by assigning them a value of 4 or 4. Table 9: Number of non-convergent cases within each condition when response probabilities based on the 3PL model. -Based Item Selection Fisher K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 20 25 13 25 - & -Based Item Selection Shannon K-L 11 15 13 17 41 44 39 37 37 48 40 32 88 Table 10: Number of non-convergent cases within each condition when response probabilities based on the fusion model. -Based Item Selection Fisher K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 22 13 16 19 - & -Based Item Selection Shannon K-L 0 0 0 1 12 25 25 74 0 8 0 8 Correlation coefficients between the values of and their corresponding true values for all of the item selection methods within each condition are presented in Table 11 for the response probabilities based on the 3PL model and in Table 12 for the response probabilities based on the fusion model. Values of bias are presented in Table 13 for probabilities based on the 3PL model and in Table 14 for those based on the fusion model. The root mean square error values are presented in Table 15 for probabilities based on the 3PL model and in Table 16 for probabilities based on the fusion model. 89 Table 11: Correlations between the true and estimated values for response probabilities based on the 3PL model. -Based Item Selection Fisher K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 0.965 0.967 0.966 0.967 0.969 0.975 0.971 0.975 - & -Based Item Selection Shannon K-L 0.950 0.952 0.954 0.950 0.960 0.951 0.956 0.956 Table 12: Correlations between the true and estimated values for response probabilities based on the fusion model. -Based Item Selection Shannon K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 0.768 0.795 0.782 0.749 0.786 0.817 0.790 0.812 - & -Based Item Selection Shannon K-L 0.718 0.203 0.763 0.222 0.752 0.227 0.762 0.230 90 Table 13: Bias statistics describing the estimated values for response probabilities based on the 3PL model. -Based Item Selection Fisher K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix -0.031 -0.019 -0.036 -0.008 -0.022 -0.017 -0.023 -0.011 - & -Based Item Selection Shannon K-L -0.062 -0.049 -0.057 -0.060 -0.047 -0.055 -0.050 -0.054 Table 14: Bias statistics describing the estimated values for response probabilities based on the fusion model. -Based Item Selection Shannon K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 0.103 0.077 0.164 0.069 0.093 0.078 0.073 0.097 - & -Based Item Selection Shannon K-L 0.160 0.674 0.144 0.113 0.061 -0.190 0.081 -0.426 91 Table 15: Root Mean Square Error of the estimated values for response probabilities based on the 3PL model. -Based Item Selection Fisher K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 0.296 0.295 0.298 0.293 0.282 0.262 0.265 0.258 - & -Based Item Selection Shannon K-L 0.335 0.324 0.324 0.334 0.302 0.329 0.318 0.317 Table 16: Root Mean Square Error of the estimated values for response probabilities based on the fusion model. -Based Item Selection Shannon K-L Math Blueprint Q-matrix Intuitive Q-matrix Reading Blueprint Q-matrix Intuitive Q-matrix 0.692 0.689 0.685 0.748 0.674 0.658 0.666 0.667 - & -Based Item Selection Shannon K-L 0.748 1.351 0.681 1.174 0.696 1.364 0.687 1.345 92 In Table 11, all the correlations are above 0.95, and the values are similar across the two conditions where item selection is based on both and and where item selection is based solely on . Correlations are typically lower for the reading test than the math test. Similarly, the bias values in Table 13 are all small and similar across the two conditions, but are smaller for the math test than the reading test. The root mean square errors also tend to be greater for the reading test. Notice that in general, the methods perform more poorly on the reading test than the math test. Overall the math test seems to have more accurate estimates than the reading test, which indicates that the reading test is not as good at measuring a single overall reading score than the math test is at measuring a single overall math score. In Table 15, the root mean square error is lower for the math test in the condition where item selection is based solely on , but is lower for the condition where item selection is based on both and in the reading test. Overall, the conditions where item selection is based solely on and where item selection is based on both and seem to perform comparably at accurately measuring the single score for the examinees when the 3PL model is used for calculating response probabilities. Now the results of the methods using the fusion model to calculate response probabilities are examined. As expected, conditions where the probabilities of obtaining a correct response are based on the fusion model do not estimate the values of the single score very well. This is intuitive because the value of is not a 93 parameter appearing in the probability function for the item responses. Hence, the values of the correlation coefficients are lower and the root mean square error and bias values are greater than desired, but what is more interesting for this study is a comparison between the different methods within this fusion-based model approach. Higher correlations, lower bias estimates and lower root mean square error values in Tables 12, 14 and 16 respectively illustrate that the condition basing item selection on both and performs better than when item selection is based solely on for estimating single scores. This means that an item selection method that takes the single score values and attribute mastery patterns into account yields better single score estimates than the item selection method that only takes attribute mastery patterns into account. To evaluate which method(s) are best overall however, the accuracy of the attribute mastery classifications must also be considered. Plots of the estimates against the true values are presented in Appendix B for methods using the 3PL model for calculating response probabilities. Conditional bias plots are presented in Appendix C for methods using 3PL model for response probabilities. One side note is pointed out here. The true versus estimated plots for conditions where the response probabilities are based on the fusion model tend to form somewhat of an S-shape, especially when compared with the corresponding plots in Appendix B. This shape is unexpected and suggests that either the methods 94 that take attribute mastery status estimation into account in item selection do not measure extreme values of the single score well (perhaps because items good at measuring attribute mastery do not have extreme 3PL difficulty parameters) or omitting values when calculating response probabilities has a negative effect on estimation. Out of curiosity, an additional condition was analyzed to try to hone in possible causes of the S-shape in the true versus estimated plots. This condition involves randomly selecting the next item to be administered. The S-shape was present in conditions where the item responses were determined by the fusion model and were not present when the item responses were determined by the 3PL model. When the item responses were based on the 3PL model, the plot actually looked more like those in Appendix B, clustering around a nice straight line. This rules out the notion that items that are selected based on attribute mastery status are not good at measuring extreme values of because items selected at random produced a similar pattern. Rather, it supports the rationale that estimating values when is not actually taken into account in determining item response patterns is a tenuous situation. In other words, a construct should be present in the probability function for determining a correct or incorrect response in a simulation if the accuracy of measuring that construct is to be evaluated. These S-shaped plots are omitted from this dissertation because at this time the relationship between and the fusion model is not sufficiently understood to be able to discuss the causes or significance of the true versus estimated relationship when the underlying probability function is the 95 fusion model, which does not include the parameter at all. The purpose of this paragraph is to emphasize the importance of the relationship between and the fusion model s parameters, which will be further discussed in the future directions of research portion of the following chapter. Attribute Mastery Estimation Optimally, an assessment approach will accurately estimate the attribute mastery of each attribute as well as the entire attribute pattern for the examinees. To evaluate the attribute mastery estimation, the correct classification rates of each measured attribute and the entire attribute pattern are presented in the following tables. Tables 17 presents these correct classification rates, or hit rates, of each method using the model 3PL to determine the response pattern probabilities for the math test. Table 18 holds the same information for the reading test. Tables 19 and 18 present the attributes correct classification rate for the fusion model-based probabilities for the math test and reading test, respectively. Recall, a list of the attributes measured by each Q-matrix for each subject portion is presented in Appendix A. 96 Table 17: The math test s attribute mastery hit rates for response probabilities based on the 3PL model. -Based Item Selection Blueprint Q-matrix: Attribute 1 2 3 4 5 6 7 8 9 10 11 Mean 1-11 Whole Pattern Fisher 0.797 0.712 0.686 0.710 0.783 0.833 0.808 0.814 0.557 0.807 0.746 0.750 0.169 K-L 0.792 0.696 0.681 0.725 0.796 0.827 0.810 0.817 0.564 0.796 0.748 0.750 0.162 - & -Based Item Selection Shannon 0.789 0.715 0.678 0.718 0.780 0.833 0.805 0.808 0.585 0.825 0.763 0.754 0.169 K-L 0.795 0.703 0.710 0.703 0.794 0.816 0.816 0.818 0.556 0.823 0.770 0.755 0.176 Intuitive Q-matrix: Attribute 1 2 3 4 5 6 7 8 9 10 11 12 13 Mean 1- 13 Whole Pattern Fisher 0.835 0.813 0.826 0.765 0.795 0.801 0.741 0.804 0.800 0.622 0.847 0.825 0.685 0.782 0.220 K-L 0.838 0.823 0.824 0.765 0.806 0.764 0.749 0.809 0.813 0.598 0.860 0.822 0.664 0.780 0.211 Shannon 0.863 0.804 0.834 0.706 0.772 0.797 0.711 0.778 0.848 0.672 0.790 0.767 0.646 0.768 0.206 K-L 0.838 0.746 0.833 0.720 0.779 0.804 0.720 0.757 0.828 0.659 0.776 0.758 0.622 0.757 0.193 97 Table 18: The reading test s attribute mastery hit rates for response probabilities based on the3PL model. Blueprint Q-matrix: Attribute 1 2 3 4 5 6 Mean 1-6 Whole Pattern -Based Item Selection Fisher K-L 0.830 0.830 0.847 0.850 0.795 0.779 0.863 0.861 0.815 0.824 0.827 0.834 0.829 0.830 0.586 0.590 - & -Based Item Selection Shannon K-L 0.787 0.781 0.868 0.863 0.797 0.802 0.856 0.864 0.829 0.823 0.843 0.861 0.830 0.832 0.583 0.580 Intuitive Q-matrix: Attribute 1 2 3 4 5 6 7 Mean 1-7 Whole Pattern Fisher 0.758 0.814 0.751 0.753 0.787 0.766 0.792 0.774 0.468 K-L 0.758 0.812 0.760 0.744 0.787 0.767 0.788 0.774 0.465 Shannon 0.756 0.799 0.753 0.767 0.784 0.762 0.804 0.775 0.465 K-L 0.754 0.808 0.723 0.754 0.792 0.769 0.803 0.772 0.443 98 Table 19: The math test s attribute mastery hit rates for response probabilities based on the fusion model. -Based Item Selection Blueprint Q-matrix: Attribute 1 2 3 4 5 6 7 8 9 10 11 Mean 1-11 Whole Pattern Shannon 0.233 0.849 0.847 0.887 0.274 0.356 0.894 0.816 0.882 0.939 0.907 0.717 0.040 K-L 0.205 0.871 0.222 0.847 0.853 0.877 0.904 0.907 0.368 0.997 0.955 0.728 0.007 - & -Based Item Selection Shannon 0.797 0.800 0.688 0.781 0.882 0.890 0.879 0.883 0.643 0.937 0.813 0.817 0.160 K-L 0.801 0.801 0.703 0.797 0.878 0.876 0.898 0.885 0.554 0.939 0.835 0.815 0.170 Intuitive Q-matrix: Attribute 1 2 3 4 5 6 7 8 9 10 11 12 13 Mean 1- 13 Whole Pattern Shannon 0.905 0.802 0.925 0.645 0.873 0.941 0.837 0.440 0.910 0.868 0.734 0.900 0.858 0.818 0.090 K-L 0.242 0.159 0.876 0.222 0.916 0.896 0.261 0.339 0.978 0.307 0.260 0.279 0.265 0.461 0.029 Shannon 0.882 0.786 0.896 0.772 0.871 0.933 0.726 0.819 0.904 0.756 0.846 0.851 0.705 0.827 0.157 K-L 0.855 0.736 0.876 0.760 0.872 0.926 0.746 0.797 0.913 0.710 0.832 0.856 0.660 0.811 0.141 99 Table 20: The reading test s attribute mastery hit rates for response probabilities based on the fusion model. Attribute Blueprint Q-matrix: 1 2 3 4 5 6 Mean 1-6 Whole Pattern -Based Item Selection Shannon K-L 0.899 0.833 0.898 0.923 0.890 0.822 0.858 0.868 0.911 0.807 0.882 0.928 0.890 0.863 0.677 0.686 - & -Based Item Selection Shannon K-L 0.799 0.803 0.884 0.899 0.855 0.844 0.889 0.906 0.869 0.867 0.929 0.922 0.871 0.874 0.637 0.640 Intuitive Q-matrix: Attribute 1 2 3 4 5 6 7 Mean 1-7 Whole Pattern Shannon 0.929 0.882 0.888 0.908 0.900 0.919 0.893 0.903 0.711 K-L 0.967 0.881 0.898 0.866 0.867 0.891 0.862 0.890 0.722 Shannon 0.924 0.883 0.896 0.880 0.867 0.901 0.893 0.892 0.699 K-L 0.941 0.887 0.889 0.859 0.868 0.896 0.885 0.889 0.708 This information may be more easily compared across the various approaches through graphical representation. The correct classification rates of the attribute mastery estimates are illustrated graphically in the following figures. 100 1 0.9 Correct Classification Rate 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 Attribute Condition 3 K-L 9 10 11 All Condition 1 Fisher Condition 1 K-L Condition 3 Shannon Figure 10: Correct Attribute Mastery Classification for the Math Blueprint Q-matrix using Response Probabilities based on the 3PL Model. 1 Correct Classification Rate 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 All Attribute Condition 1 Fisher Condition 1 K-L Condition 3 K-L Condition 3 Shannon Figure 11: Correct Attribute Mastery Classification for the Math Intuitive Q-matrix using Response Probabilities based on the 3PL Model. 101 1 Correct Classification Rate 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 Condition 1 Fisher 2 3 4 Attribute 5 6 All Condition 1 K-L Condition 3 K-L Condition 3 Shannon Figure 12: Correct Attribute Mastery Classification for the Reading Blueprint Qmatrix using Response Probabilities based on the 3PL Model. 1 0.9 Correct Classification Rate 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 All Attribute Condition 1 Fisher Condition 1 K-L Condition 3 K-L Condition 3 Shannon Figure 13: Correct Attribute Mastery Classification for the Reading Intuitive Qmatrix using Response Probabilities based on the 3PL Model. 102 1 0.9 Correct Classification Rate 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 All Attribute Condition 2 K-L Condition 2 Shannon Condition 3 K-L Condition 3 Shannon Figure 14: Correct Attribute Mastery Classification for the Math Blueprint Q-matrix using Response Probabilities based on the Fusion Model. 1 0.9 Correct Classification Rate 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 All Condition 2 K-L Condition 3 K-L Attribute Condition 2 Shannon Condition 3 Shannon Figure 15: Correct Attribute Mastery Classification for the Math Intuitive Q-matrix using Response Probabilities based on the Fusion Model. 103 1 0.9 Correct Classification Rate 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 4 5 Attribute Condition 2 Shannon Condition 3 K-L 2 3 6 All Condition 2 K-L Condition 3 Shannon Figure 16: Correct Attribute Mastery Classification for the Reading Blueprint Qmatrix using Response Probabilities based on the Fusion Model. 1 Correct Classification Rate 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 All Attribute Condition 2 K-L Condition 2 Shannon Condition 3 K-L Condition 3 Shannon Figure 17: Correct Attribute Mastery Classification for the Reading Intuitive Qmatrix using Response Probabilities based on the Fusion Model. 104 With regard to the approaches based on the 3PL model for determining item response probabilities, condition 1 and condition 3 for math and reading portion of the test on the individual attribute level as well as for the entire attribute pattern. An examination of the above figures indicates that within the item selection condition based solely on , the use of Fisher Information and K-L Information in selecting items optimally for the current estimate both perform equally well. Results are more irregular for the methods based on the fusion model for item response probabilities. Notice that for the math test, the item selection conditions based solely on and based on both and perform similarly, but an overall examination shows that the condition based on both and correctly classifies the examinees more consistently as masters or non-masters of the measured attributes, while the item selection methods based on alone show more fluctuation. Within the condition based solely on , using Shannon Entropy seems to produce more accurate attribute mastery classifications than using K-L Information. For the reading portion, however, the second condition yields slightly higher correct classifications for many of the attributes and for the overall mastery patterns, but overall the differences between the two conditions correct classification rates are quite small. It is surprising that the item selection method based solely on did not perform much better than the item selection condition based on both and . Condition 2 only selected items based on the current attribute mastery pattern estimate, j , and condition 3 takes both j and j into account, so it is logical that 105 condition 2 would perform quite a bit better than condition 3 with regard to correct attribute mastery estimation, but the results were comparable. Thus, a test administrator would not have to sacrifice much attribute mastery classification precision in order to obtain higher precision in estimates. Within the third condition, where items are selected based on both and , utilizing K-L Information and Shannon Entropy seem to perform equally well in correctly estimating attribute mastery. Item Exposure In computerized adaptive testing, it is desirable to keep item exposure to a minimum to assure test security. But some items are better at measuring the underlying construct than others, so minimizing exposure control and maximizing measurement precision have a give-and-take relationship. Due to the importance of test security, the item s exposure rates for the various methods implemented in this study will be examined. The item exposure rates for the CAT-simulation based on the 3PL model are presented in Table 21 for the math portion of the test and Table 22 for the reading portion. Likewise, the item exposure rates for the CAT-simulation based on the fusion model are presented in Table 23 for the math portion of the test and Table 24 for the reading portion. 106 Table 21: Frequencies and proportions of item exposure for the math test for response probabilities based on the 3PL model. Blueprint Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 -Based Item Selection Fisher K-L Freq. Prop. Freq. 114 0.29 153 169 0.43 136 39 0.10 36 29 0.07 28 13 0.03 8 10 0.03 9 10 0.03 12 10 0.03 8 2 0.01 6 0 0.00 0 0 0.00 0 Prop. 0.39 0.34 0.09 0.07 0.02 0.02 0.03 0.02 0.02 0.00 0.00 - & -Based Item Selection Shannon K-L Freq. Prop. Freq. Prop. 137 0.35 147 0.37 134 0.34 139 0.35 42 0.11 32 0.08 46 0.12 23 0.06 15 0.04 20 0.05 4 0.01 17 0.04 5 0.01 6 0.02 9 0.02 10 0.03 4 0.01 2 0.01 0 0.00 0 0.00 0 0.00 0 0.00 Intuitive Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 Fisher Freq. Prop. 152 0.38 132 0.33 31 0.08 32 0.08 21 0.05 11 0.03 5 0.01 10 0.03 2 0.01 0 0.00 0 0.00 K-L Freq. Prop. 170 0.43 114 0.29 33 0.08 29 0.07 19 0.05 12 0.03 7 0.02 7 0.02 5 0.01 0 0.00 0 0.00 Shannon Freq. Prop. 126 0.32 149 0.38 39 0.10 22 0.06 36 0.09 16 0.04 6 0.02 2 0.01 0 0.00 0 0.00 0 0.00 K-L Freq. 122 154 40 26 24 18 7 3 2 0 0 Prop. 0.31 0.39 0.10 0.07 0.06 0.05 0.02 0.01 0.01 0.00 0.00 107 Table 22: Frequencies and proportions of item exposure for the reading test for response probabilities based on the 3PL model. Blueprint Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 -Based Item Selection Fisher K-L Freq. Prop. Freq. 100 0.25 116 115 0.29 100 32 0.08 37 28 0.07 21 13 0.03 17 14 0.04 9 11 0.03 8 8 0.02 9 0 0.00 4 3 0.01 2 0 0.00 1 Prop. 0.29 0.25 0.09 0.05 0.04 0.02 0.02 0.02 0.01 0.01 0.00 - & -Based Item Selection Shannon K-L Freq. Prop. Freq. Prop. 121 0.31 123 0.31 115 0.29 103 0.26 22 0.06 34 0.09 12 0.03 12 0.03 13 0.03 17 0.04 18 0.05 5 0.01 6 0.02 9 0.02 6 0.02 13 0.03 5 0.01 5 0.01 6 0.02 3 0.01 0 0.00 0 0.00 Intuitive Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 Fisher Freq. Prop. 100 0.25 115 0.29 32 0.08 26 0.07 8 0.02 20 0.05 10 0.03 8 0.02 3 0.01 0 0.00 0 0.00 K-L Freq. 111 106 36 22 13 11 10 6 7 2 0 Prop. 0.28 0.27 0.09 0.06 0.03 0.03 0.03 0.02 0.02 0.01 0.00 Shannon Freq. Prop. 103 0.26 115 0.29 32 0.08 16 0.04 14 0.04 22 0.06 15 0.04 1 0.00 6 0.02 0 0.00 0 0.00 K-L Freq. 109 114 30 13 19 13 13 10 3 0 0 Prop. 0.28 0.29 0.08 0.03 0.05 0.03 0.03 0.03 0.01 0.00 0.00 108 Table 23: Frequencies and proportions of item exposure for the math test for response probabilities based on the fusion model. Blueprint Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 -Based Item Selection Shannon K-L Freq. Prop. Freq. Prop. 156 0.39 311 0.79 134 0.34 22 0.06 42 0.11 15 0.04 11 0.03 8 0.02 13 0.03 0 0.00 11 0.03 0 0.00 13 0.03 2 0.01 9 0.02 3 0.01 5 0.01 8 0.02 1 0.00 6 0.02 1 0.00 21 0.05 - & -Based Item Selection Shannon K-L Freq. Prop. Freq. Prop. 160 0.40 181 0.46 116 0.29 108 0.27 51 0.13 34 0.09 29 0.07 19 0.05 8 0.02 17 0.04 12 0.03 7 0.02 3 0.01 17 0.04 9 0.02 3 0.01 7 0.02 8 0.02 1 0.00 2 0.01 0 0.00 0 0.00 Intuitive Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 Shannon Freq. Prop. 127 0.32 145 0.37 50 0.13 28 0.07 21 0.05 17 0.04 5 0.01 1 0.00 1 0.00 1 0.00 0 0.00 K-L Freq. 205 115 21 13 2 0 5 9 16 2 8 Prop. 0.52 0.29 0.05 0.03 0.01 0.00 0.01 0.02 0.04 0.01 0.02 Shannon Freq. Prop. 139 0.35 143 0.36 43 0.11 19 0.05 19 0.05 19 0.05 5 0.01 5 0.01 4 0.01 0 0.00 0 0.00 K-L Freq. 148 130 44 18 28 9 7 6 5 1 0 Prop. 0.37 0.33 0.11 0.05 0.07 0.02 0.02 0.02 0.01 0.00 0.00 109 Table 24: Frequencies and proportions of item exposure for the reading test for response probabilities based on the fusion model. Blueprint Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 -Based Item Selection Shannon K-L Freq. Prop. Freq. Prop. 93 0.23 200 0.51 131 0.33 71 0.18 51 0.13 7 0.02 7 0.02 4 0.01 2 0.01 1 0.00 7 0.02 1 0.00 7 0.02 2 0.01 13 0.03 2 0.01 9 0.02 6 0.02 3 0.01 7 0.02 1 0.00 23 0.06 - & -Based Item Selection Shannon K-L Freq. Prop. Freq. Prop. 131 0.33 134 0.34 95 0.24 101 0.26 35 0.09 25 0.06 10 0.03 20 0.05 11 0.03 3 0.01 13 0.03 8 0.02 10 0.03 2 0.01 3 0.01 9 0.02 9 0.02 17 0.04 7 0.02 2 0.01 0 0.00 3 0.01 Intuitive Q-matrix: Exposure Rate Not Exposed 0.0 to 0.099 0.1 to 0.199 0.2 to 0.299 0.3 to 0.399 0.4 to 0.499 0.5 to 0.599 0.6 to 0.699 0.7 to 0.799 0.8 to 0.899 0.9 to 1.000 Shannon Freq. Prop. 87 0.22 135 0.34 48 0.12 12 0.03 6 0.02 2 0.01 10 0.03 9 0.02 13 0.03 2 0.01 0 0.00 K-L Freq. Prop. 194 0.49 81 0.20 7 0.02 2 0.01 0 0.00 0 0.00 0 0.00 2 0.01 11 0.03 2 0.01 25 0.06 Shannon Freq. Prop. 102 0.26 112 0.28 44 0.11 9 0.02 16 0.04 18 0.05 15 0.04 2 0.01 6 0.02 0 0.00 0 0.00 K-L Freq. 114 117 31 10 7 7 17 13 7 1 0 Prop. 0.29 0.30 0.08 0.03 0.02 0.02 0.04 0.03 0.02 0.00 0.00 Plots of the item exposure frequencies of the items in each test are presented in Appendix D for the approaches based on the 3PL model probabilities and in Appendix E for the approaches based on the fusion model probabilities. 110 An example of desirable item exposure involves all of the items exposed to less than twenty percent of the examinees and would have no items that were not administered at all. While the 5-4-3-2-1 exposure control method did not perform as ideally as this, the exposure tendencies can be compared across the various methods. For instance, the method in condition 2 based on maximizing K-L Information has between two and six percent of the items exposed to at least ninety percent of the examinees. This is an unacceptably high exposure rate. The items selection condition based on both and tends to have better exposure rates than the condition based solely on , and tends to have comparable exposure rates to the condition based solely on . The different methods within the -based and within and -based item selection conditions also tend to be comparable with regard to exposure control. Overall Performance Evaluation of the different techniques encompasses an evaluation of estimation accuracy, attribute mastery estimation accuracy, and item exposure control. Overall results should thus be considered across all of the areas. With regard to estimating , the item selection conditions based on both and and on alone produce comparable results. And the condition where item selection is based on both and performs better than when item selection is based only on . Surprisingly, the item selection conditions based on both and and on only 111 produce comparable results with regard to the attribute mastery estimates. The condition selecting items based on both and outperforms the condition based solely on with regard to estimation, attribute mastery pattern estimation, and item exposure control. But between the condition based solely on and the condition based on both and , neither consistently outperforms the other. Both methods perform well and similarly with regard to estimation, attribute mastery estimation, and item exposure control. Notice that the results for the intuitive Q-matrix for the reading test are quite poor with respect to all three of the discussed criteria, and certainly performed more poorly than the corresponding results based on the Blueprint Q-matrix. This emphasizes the importance of Q-matrix construction. It would be inappropriate to underestimate the significance of Q-matrix development. Therefore, great care should go into this important initial step in the cognitive diagnosis process. Additional precautions could be taken, such as asking a content expert to review a Qmatrix before the cognitively diagnostic analysis. The item selection method based on both attribute mastery estimates and single score estimates as well as the method based solely on single score estimates both perform well and similarly with regard to estimation, attribute mastery estimation, and item exposure control. Furthermore, the different approaches within these item selection techniques (i.e. Fisher Information versus K-L Information in the condition where items selection is based solely on and Shannon Entropy versus 112 K-L Information in the condition based on both and ) seem to have little difference in the results of this study. 113 CHAPTER FIVE: DISCUSSION The final chapter of this dissertation discusses the results of each of the two major portions of the study as well as the educational implications and future directions of this line of research. PART ONE: Analyzing an Existing Test in a Cognitive Diagnosis Framework The first portion of this research study dealt with the analysis of a large-scale test s results from a cognitively diagnostic standpoint and had two main objectives. First, it aimed to obtain specific diagnostic information about individual examinees. Second, it sought to obtain useful item parameters for evaluating each item s success in assessing the attributes. These are two of the major requirements of a successful cognitively diagnostic assessment outlined by Hartz, et al. (2002) and discussed in Chapter Two in the section titled, The Fusion Model. This section sought to accomplish these objectives with an existing exam to demonstrate the possibility of conducting cognitive diagnosis without requiring a new set of cognitive diagnosisspecific test items. The goal was to ensure that meeting these objectives was possible before extending this new model to a CAT framework, instead of just assuming the fusion model would succeed with a real dataset. As mentioned previously, a major advantage of using a cognitively diagnostic model is that the feedback form the test is individualized to the student s strengths and weaknesses. In other words, examinees who would receive the same score in a 114 conventional single-score assessment may receive different patterns of attribute mastery status. For example, two examinees from the third grade math test who received the same traditional IRT-based single score of 0.21 (on the z-scale) received the following different attribute mastery patterns: (1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1) and (1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1) with a one signifying mastery status of that attribute and a zero indicating non-mastery status for each of the eleven measured attributes in the test blueprint. This is significant because this cognitive diagnosis-based analysis indicates the specific attributes with which each student needs improvement. More specifically, the first student needs to work on attributes 2, 3, 7, 9 and 10 in order to achieve mastery status on all of the measured attributes. Alternatively, the second student needs to study attributes 2, 8 and 10 in order to achieve overall mastery status. Overall, the first portion of this study demonstrates that the fusion model could be applied successfully to an existing assessment as a cognitively diagnostic assessment approach and meets the objectives of cognitive diagnosis without requiring a new test be constructed from scratch. PART TWO: Adaptively Enhancing the Assessment Process The goal of the second major section of this study was to incorporate computer adaptive testing technology to a cognitively diagnostic testing situation. This study looked at a variety of ways of achieving this goal, including an item 115 selection approach based on estimates, an approach based on attribute mastery estimates, and an approach based on both of these simultaneously. In sum, selecting items based on current estimates alone or based on both and attribute mastery estimates by means of shadow test are both good methods with regard to single score estimation, attribute mastery estimation and item exposure control. Determining which of these two to implement then depends on other issues the test administrator may be facing. If other constraints are necessary in the testing process, then the shadow test approach that selects items based on both single score and attribute mastery estimation is better because it can easily and efficiently incorporate the additional requirements. Such constrains include content balancing, item format constraints, testlet constraints, among others. Van der Linden and Reese (1998) and van der Linden (2000) expound on incorporating such constraints. On the other hand, if a test administrator prefers a more simple approach to item selection and does not have access to special software like CPLEX nor the need for additional constraints, then an item selection method based on the current single score estimates would suffice. The differences between Fisher Information and K-L Information in the condition based solely on and between Shannon Entropy versus K-L Information in the condition based on both and were negligible, so the use of either selection technique within each condition is up to the discretion of the test administrator. 116 This study compares the accuracy of three possible item selection methods in a computer adaptive testing situation focused on estimating diagnostic attribute information in addition to the conventional single score estimation. This simulation study is important in an educational context because it explores the accuracy of these methods with regard to these to assessment approaches. The Single Score Estimation section of the previous chapter discusses how determining item response probabilities based on the fusion model, which does not incorporate the single score in it all, results in poorer estimation than when the item response probability are based on the 3PL model. Interestingly, the reverse situation is not true. Response probabilities based on the 3PL model and the fusion model have similar results with regard to attribute mastery estimation. This may suggest that additional simulation research in this should utilize a model like the 3PL, which incorporates values into the response probability calculation because the results where the response probabilities were based on the 3PL model were more consistent and clear. Simulation studies like this one can also be used by test administrators to determine how well the various attributes of interest are being measured. For instance, in Figures 10 and 11, the mastery status for attributes 9 and 10, respectively, are not estimated as accurately as the other attributes. Notice that both of these attributes deal with the students ability to estimate a reasonable solution to the item. Test administrators could use this information to evaluate poorly measured attributes or to examine items measuring the attribute to try to better assess what they originally had in mind for this attribute. In addition, the attribute-based item difficulty parameter, k *, 117 and attribute-based item discrimination parameters, rik *, can be examined for the poorly estimated attribute(s) in this evaluation process. For instance, an item i with a high value of rik * for a given attribute k, then the assignment of this attribute to this item can be reevaluated. This highlights the importance of the diagnostic model s ability to estimate parameters that can be used to assess how well the items measure the attributes involved, which is emphasized in Hartz, et al. (2002). Educational Implications and Future Directions of Research This area of research holds a great deal of educational importance because the wide scale application of diagnostic techniques in educational testing can lead to more informative feedback to better help both teachers and students throughout the educational process. Diagnostic assessment can help provide specific information regarding the kinds of help an individual needs. Furthermore, the cognitive diagnosis can be used to gauge an individual s readiness to move on to higher levels of understanding and skill in the given content domain (Gott, 1990, p. 174). The test becomes an active tool in the educational process rather than a passive evaluator. Teachers can use the diagnostic feedback provided by such a test to hone in on the specific needs of individual students. Thus, building a bridge between diagnostic assessment and mandatory standardized tests can facilitate in customizing education to meet the needs of individual students rather than relying on a one-size fits all approach state-mandated tests. Some possible future research directions might be to include the Rule Space Model in such an analysis or to cross-validate the resulting attribute vectors of the 118 examinees, perhaps by means of interviewing a portion of the sampled students. It might also be interesting to study how to handle flagged examinees and attribute mastery estimates close to the cut-off value. What should be reported to these students? Perhaps, alternatives to dichotomizing the attribute mastery estimates would be more beneficial, such as allowing mastery levels to be arranged in a continuum. Another area of future research is the exploration of the relationship between the item difficulty parameter in the 3PL model and the attribute-based item difficulty in the fusion model. Alternative item exposure control methods may also be of interest in the context of computer adaptive testing focused on both single score and attribute mastery estimation. Conditions based on the fusion models produce more erratic results than the conditions based on the 3PL model. This is noticeable across the item selection condition based on both and , where confounding issues are held constant and the only difference between two corresponding cells in the research design is the model used to determine the probability of obtaining correct or incorrect responses to each item selected. It may be interesting for future research to explore this issue to find out why this difference occurs. Other future areas of research included an examination of the effect of the number of attributes measured, as well as the number of items measuring each attribute and the number of attributes measured by a single item. Alternative approaches to item selection might be another interesting area of future research. Also, there were quite a few nonconvergent cases with regard to estimation (which used the maximum likelihood 119 estimation procedure) in the CAT administration, as illustrated in Tables 9 and 10. Perhaps alternative estimation procedures, such as expected a posteriori (EAP) estimation, might yield better results in this respect. Clearly, there is a myriad of future directions of research in this area. Hopefully this piece of research may serve as an interesting starting point for some of these additional issues. Cognitive diagnosis is important in educational assessment because it provides helpful feedback to students about specific elements of the measured content domain. It is rapidly becoming a requirement of effective, educationally beneficial test development (No Child Left Behind Act, 2001). The challenge then becomes how to adapt the methods developed within the CAT framework to enable this new approach. This study utilized the Shadow Test procedure to achieve the best of both worlds. While this study was conducted using the Fusion model s framework for cognitive diagnosis, the procedure can be generalized to any diagnostic model which estimates the attribute states of the examinees, such as the Noisy Inputs Deterministic And gate (NIDA) model (see Maris, 1999), the Generalized Latent Trait Model (GLTM) (Embretson, 1984), or the Rule Space method (Tatsuoka and Tatsuoka, 1982) in future studies. 120 APPENDIX A List of Attributes Measured by Each Test Blueprint Q-matrix for the math test: 1. 2. 3. 4. Demonstrate an understanding of number concepts. Demonstrate an understanding of mathematical relations. Demonstrate an understanding of geometric properties and relationships. Demonstrate an understanding of measurement concepts using metric and customary units. 5. Demonstrate an understanding of probability and statistics. 6. Use the operation of addition to solve problems. 7. Use the operation of subtraction to solve problems. 8. Use the operation of multiplication and/or division to solve problems. 9. Estimate solutions to a problem situation and/or evaluate the reasonableness of a solution to a problem situation. 10. Determine solution strategies and analyze or solve problems. 11. Express or solve problems using mathematical representation. Intuitive Q-matrix of the math test: 1. 2. 3. 4. 5. 6. 7. Understanding representation Counting Multiplication Division Addition Subtraction Understanding geometric shapes (turning, flipping, draw a line of symmetry, etc.) 8. Read a chart 9. Set up an arithmetic calculation from verbal information 10. Estimation 11. Read a table of numbers 12. Using standard units of measure 13. Understanding and forming order of magnitude Blueprint Q-matrix for the reading test: 121 1. 2. 3. 4. 5. Determine the meaning of words in a variety of written texts. Identify supporting ideas in a variety of written texts. Summarize a variety of written texts. Perceive relationships and recognize outcomes in a variety of written texts. Analyze information in a variety of written texts in order to make inferences and generalizations. 6. Recognize points of view, propaganda, and/or statements of fact and opinion in a variety of written texts. Intuitive Q-matrix of the reading test: 1. 2. 3. 4. 5. 6. Chronology Causality (determining why) Word Meaning General Summary Observing/Remembering Details Knowing Fact versus Opinion 7. Speculating from Contextual Clues 122 APPENDIX B Plots of Estimates versus True Values of Theta for Probabilities based on the 3PL Model Condition 1 Math Blueprint Q-matrix using Fisher Information 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 True Theta 2 4 Condition 1 Math Blueprint Q-matrix using K-L Information 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 123 -6 True Theta 2 4 Condition 1 Math Intuitive Q-matrix using Fisher Information 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 4 Condition 1 Math Intiutive Q-matrix using K-L Information 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 True Theta 2 4 124 Condition 1 Reading Blueprint Q-matrix using Fisher Information 4 3 2 Estiamted Theta 1 0 -5 -4 -3 -2 -1 -1 0 -2 -3 -4 -5 -6 True Theta 1 2 Condition 1 Reading Blueprint Q-matrix using K-L Information 4 3 2 Estimated Theta 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 125 Condition 1 Reading Intuitive Q-matrix using Fisher Information 4 3 Estimated Theta 2 1 0 -5 -3 -1 -1 -2 -3 -4 -5 True Theta 1 3 Condition1 Intuitive Q-matrix using K-L Information 4 3 Estimated Theta 2 1 0 -5 -3 -1 -1 -2 -3 -4 -5 True Theta 1 3 126 Condition 3 Math Blueprint Q-matrix using K-L Information 5 4 3 2 1 0 -1 0 -2 -3 -4 -5 -6 Estimated Theta -6 -4 -2 2 4 True Theta Condition 3 Math Blueprint Q-matrix using Shannon Entropy 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 4 127 Condition 3 Math Intuitive Q-matrix using K-L Information 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 4 Condition 3 Math Intuitive Q-matrix using Shannon Entropy 5 4 3 Estimated Theta 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 True Theta 2 4 128 Condition 3 Reading Blueprint Q-matrix using K-L Information 4 3 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 Estimated Theta Condition 3 Blueprint Q-matrix using Shannon Entropy 4 3 2 Estimated Theta 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 129 Condition 3 Reading Intuitive Q-martix using K-L Information 4 3 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 Estimated Theta Condition 3 Reading Intuitive Q-matrix using Shannon Entropy 4 3 2 1 0 -6 -4 -2 -1 0 -2 -3 -4 -5 -6 True Theta 2 Estimated Theta 130 APPENDIX C Conditional Bias Plots of Theta Estimates for Probabilities based on the 3PL Model Condition 1 Math Blueprint Q-matrix using Fisher Information 2 0 -6 -4 -2 -2 -4 -6 -8 Theta 0 2 4 Bias Condition 1 Math Blueprint Q-matrix using K-L Information 2 0 -5 Bias -4 -3 -2 -1 -2 -4 -6 -8 Theta 0 1 2 3 131 Condition 1 Math Intuitive Q-matrix using Fisher Information 2 0 -6 Bias -4 -2 -2 -4 -6 -8 Theta 0 2 4 Condition 1 Math Intiutive Q-matrix using K-L Information 2 0 -6 -4 -2 -2 -4 -6 -8 Theta 0 2 4 Bias 132 Condition 1 Reading Blueprint Q-matrix using Fisher Information 2 0 -5 Bias -4 -3 -2 -1 -2 -4 -6 -8 Theta 0 1 2 Condition 1 Reading Blueprint Q-matrix using K-L Information 2 0 -6 -4 -2 -2 -4 -6 -8 Theta 0 2 Bias 133 Condition 1 Reading Intuitive Q-matrix using Fisher Information 4 3 Estimated Theta 2 1 0 -5 -3 -1 -1 -2 -3 -4 -5 True Theta 1 3 Condition1 Intuitive Q-matrix using K-L Information 3 1 -5 bias -4 -3 -2 -1 -1 0 -3 -5 -7 -9 1 2 Theta 134 Condition 3 Math Blueprint Q-matrix using K-L Information 2 1.5 1 0.5 Bias 0 -6 -4 -2 -0.5 0 -1 -1.5 -2 -2.5 Theta 2 4 Condition 3 Math Blueprint Q-matrix using Shannon Entropy 2 1.5 1 0.5 Bias 0 -6 -4 -2 -0.5 0 -1 -1.5 -2 Theta 2 4 135 Condition 3 Math Intuitive Q-matrix using K-L Information 1.5 1 0.5 0 Bias -6 -4 -2 -0.5 0 -1 -1.5 -2 -2.5 Theta 2 4 Condition 3 Math Intuitive Q-matrix using Shannon Entropy 2 1.5 1 0.5 Bias 0 -6 -4 -2 -0.5 0 -1 -1.5 -2 -2.5 Theta 2 4 136 Condition 3 Reading Blueprint Q-matrix using K-L Information 1.5 1 0.5 0 Bias -6 -4 -2 -0.5 0 -1 -1.5 -2 -2.5 2 Theta Condition 3 Blueprint Q-matrix using Shannon Entropy 1.5 1 0.5 0 Bias -6 -4 -2 -0.5 0 -1 -1.5 -2 -2.5 2 Theta 137 Condition 3 Reading Intuitive Q-matrix using K-L Information 1.5 1 0.5 0 Bias -6 -4 -2 -0.5 -1 -1.5 -2 -2.5 0 2 Theta Condition 3 Reading Intuitive Q-matrix using Shannon Entropy 2 1.5 1 0.5 Bias 0 -6 -4 -2 -0.5 0 -1 -1.5 -2 -2.5 Theta 2 138 APPENDIX D Item Exposure Frequencies for Simulations Based on 3PL Model Probabilities Condition 1 Math Blueprint Q-matrix using Fisher Information 2500 2000 Exposure 1500 1000 500 0 0 100 200 Item 300 400 Condition 1 Math Blueprint Q-matrix using K-L Information 2500 2000 Exposure 1500 1000 500 0 0 100 200 Item 300 400 139 Condition 1 Math Intuitive Q-matrix using Fisher Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 1 Math Intuitive using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 140 Condition 1 Reading Blueprint Q-matrix using Fisher Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 1 Reading Blueprint Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 141 Condition 1 Reading Intuitive Q-matrix using Fisher Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 1 Reading Intuitive Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 142 Condition 3 Math Blueprint Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 3 Math Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 143 Condition 3 Math Intuitive using KL Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 3 Math Intuitive Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 144 Condition 3 Reading Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 3 Reading Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 145 Condition 3 Reading Intuitive Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 3 Reading Intuitive Q-matrix using Shannon Entropy 2500 2000 Exposure 1500 1000 500 0 0 100 200 Item 300 400 146 APPENDIX E Item Exposure Frequencies for Simulations Based on Fusion Model Probabilities Condition 2 Math Blueprint Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 2 Math Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 147 Condition 2 Math Intuitive Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 2 Math Intuitive Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Attribute 300 400 148 Condition 2 Math Blueprint Q-matrix using K-L Information 3500 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 2 Reading Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 149 Condition 2 Reading Intuitive Q-matrix using K-L Information 3500 3000 Exposure 2500 2000 1500 1000 500 0 0 100 200 Item 300 Condition 2 Reading Intuitive Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 150 Condition 3 Math Blueprint Q-matrix using K-L Informaiton 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 3 Math Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 151 Condition 3 Math Intuitive Q-matrix using K-L Information 3000 2500 Expossure 2000 1500 1000 500 0 0 100 200 Item 300 400 Condition 3 Math Intuitive Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 400 152 Condition 3 Reading Blueprint Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 3 Reading Blueprint Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 153 Condition 3 Reading Intuitive Q-matrix using K-L Information 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 Condition 3 Reading Intuitive Q-matrix using Shannon Entropy 3000 2500 Exposure 2000 1500 1000 500 0 0 100 200 Item 300 154 REFERENCES Bertsimas, D. and Tsitsiklis, J. N. (1997). Introduction to Linear Optimization. Belmont, Massachusetts: Athena Scientific. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee s ability. In F. M. Lord and M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesely. Chapters 17-20. Birenbaum, M. and Tatsuoka, K. K. (1993). Applying an IRT-based cognitive diagnostic model to diagnose students knowledge states in multiplication and division with exponents. Applied Measurement in Education 6(4), 225-268. Campione and Brown. (1990). Guided learning and transfer: Implications for approaches to assessment. In N. Frederiksen, R. L. Glasser, A. M. Lesgold, and M. G. Shafto (Eds.), Diagnostic monitoring of skills and knowledge acquisition (p.453-486). Hillsdale, NJ: Lawrence Erlbaum Associates. Chang, H. & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229. Chang, H. & Ying, Z. (1999). a-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222. Chen, S., Ankenmann, R. D., and Chang, H. (2000). A Comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24, 241-255. Cheng, P. E. and Liou, M. (2000). Estimation of trait level in computerized adaptive testing. Applied Psychological Measurement, 24, 257-265. Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, 327-335. Chipman, S. F., Nichols, P. D., and Brennan, R. L. (1995). Introduction. In P. D. Nichols, S. F. Chipman, and R. L. Brennan (Eds.), Cognitively Diagnostic Assessment (p. 327-361). Hillsdale, NJ: Lawrence Erlbaum Associates. Davey, T. and Parshall, C. G. (1995). New algorithms for item selection and exposure control with computerized adaptive testing. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA. 155 DiBello, L., Stout, W., and Roussos, L. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. In P. D. Nichols, S. F. Chipman, and R. L. Brennan (Eds.), Cognitively Diagnostic Assessment (p. 327-361). Hillsdale, NJ: Lawrence Erlbaum Associates. Eggen, T. J. H. M. (1999). Item selection in adaptive testing with sequential probability ratio test. Applied Psychological Measurement, 23, 249-261. Embretson, S. (1990). Diagnostic testing by measuring learning processes: Psychometric considerations for dynamic testing. In N. Frederiksen, R. L. Glasser, A. M. Lesgold, and M. G. Shafto (Eds.), Diagnostic monitoring of skills and knowledge acquisition (p.453-486). Hillsdale, NJ: Lawrence Erlbaum Associates. Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica 37, 359-374. Gelfand, A. E. (1997). Gibbs Sampling. In Kotz, S. Johnson, N. L. and Read, C. B. Eds.), Encyclopedia of Statistical Sciences, update 1. (p. 283-291). New York, NY: John Wiley and Sons. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis. London, England: Chapman & Hall. Gott, S. (1990). Assisted learning of strategic skills. In N. Frederiksen, R. L. Glasser, A. M. Lesgold, and M. G. Shafto (Eds.), Diagnostic monitoring of skills and knowledge acquisition (p.453-486). Hillsdale, NJ: Lawrence Erlbaum Associates. Green, B. F., Bock, R. D., Humphreys, L. G., Linn, R. L., Reckase, M. D. (1984). Technical guidelines for assessing computerized adaptive testing. Journal of Educational Measurement, 21, 347-360. Harris, B. (1988). Entropy. In S. Kotz, N. L. Johnson and C. B. Read, Eds.), Encyclopedia of Statistical Sciences, Vol 2. (p. 512-516). New York, NY: John Wiley and Sons. Hartz, S. (2002). A Bayesian framework for the Unified Model for assessing cognitive abilities: blending theory with practice. Doctoral thesis, The University of Illinois at Urbana-Champaign. Hartz, S.. Roussos, L., and Stout, W. (2002) Skills Diagnosis: Theory and Practice. User Manual for Arpeggio software. ETS. 156 Hawkins, D. M. (1988). Branch-and-bound method. In S. Kotz, N. L. Johnson and C. B. Read, Eds.), Encyclopedia of Statistical Sciences, Vol 1. (p. 314-316). New York, NY: John Wiley and Sons. ILOG, Incorporation. (2003) CPLEX Software Program, version 8.1. Incline Village, NV: CPLEX Division. Jiang, H. (1996). Applications of Computational Statistics in Cognitive Diagnosis and IRT Modeling. Doctoral thesis, The University of Illinois at Urbana-Champaign. Kullback, S. (1988). Kullback Information. In S. Kotz, N. L. Johnson and C. B. Read, Eds.), Encyclopedia of Statistical Sciences Vol 4. (p. 421-425). New York, NY: John Wiley and Sons. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79-86. Linn, R. (1990). Diagnostic testing. In N. Frederiksen, R. L. Glasser, A. M. Lesgold, and M. G. Shafto (Eds.), Diagnostic monitoring of skills and knowledge acquisition (p.453-486). Hillsdale, NJ: Lawrence Erlbaum Associates. Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability. Psychological Monographs, 61, No. 285. Lord, F. M. (1971a). A theoretical study of the measurement effectiveness of flexilevel tests. Educational and Psychological Measurement, 31, 805-813. Lord, F. M. (1971b). A theoretical study of two-stage testing. Psychometrika, 36, 227242. Meijer, R. R. and Nering, M. L. (1999). Computerized adaptive testing: Overview and Introduction. Applied Psychological Measurement, 23, 187-194. McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New Horizons in Testing (pp. 223-236). New York: Academic Press. Parshall, Harmes and Kromrey (2000). Item Exposure control in computerized adaptive testing: The use of freezing to augment stratification. Florida journal of educational research, 40, 28-52. Patz, R. J. and Junker, B. W. (June 1997). A straightforward approach to Markov Chain Monte Carlo methods for item response models. Technical Report 658. Retrieved August 27, 2003, from http://www.stat.cmu.edu/cmu-stats/tr/tr658/tr658.html 157 Reckase, M. D. (1989). Adaptive testing: The evolution of a good idea. Educational Measurement: Issues and Practice, 8, 11-15. Rogers, H. J., Swaminathan, H. and Hambleton, R. K. (1991). Fundamentals of item response theory: Measurement methods for the social sciences volume 2. Chapter two: Concepts, models and features. Thousand Oaks, CA: Sage Publications. Samejima, F. (1995). A cognitive diagnosis model using latent trait models: Competency space approach and its relationship with DiBello and Stout s unified cognitive psychometric diagnosis model. In P. D. Nichols, S. F. Chipman, and R. L. Brennan (Eds.), Cognitively Diagnostic Assessment (p. 391-410). Hillsdale, NJ: Lawrence Erlbaum Associates. Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379-423, 623-656. Smith, A. (April 2003). Markov Chain Monte Carlo simulation made simple. Retrieved August 27, 2003, from http://www.nyu.edu/gsas/dept/politics/grad/classes/quant2/mcmc_note.pdf Stevens, J. (1996). Applied multivariate statistics for the social sciences. (p. 111). Mahwah, NJ: Lawrence Erlbaum Associates. Stocking, M. L. and Lewis, C. (1998). Controlling item exposure conditional on ability on computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57-75. Stocking, M. L. and Lewis, C. (2000). Methods for controlling the exposure of items in CAT. In W. Van der Linden, and C.A.W. Glas (Eds.), Computerized adaptive testing: Theory and practice (p. 163-182). Dordrecht, The Netherlands: Kluwer Academic Publishers. Stocking, M. L. and Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17, 277-292. Stout, W., et al. (2002). Arpeggio Software Program, version 1.1. Princeton, NJ: Educational Testing Service. Sympson, J. B. and Hetter, R. D. (1985). Controlling item exposure rates in computerized adaptive testing. Proceedings of the 27th annual meeting of the Military Testing Association, (pp. 973-977). San Diego, CA: Navy Personnel Research and Developemtn Center. 158 Tatsuoka, K. K. (1995). Architecture of knowledge structure and cognitive diagnosis: A statistical pattern recognition and classification approach. In P. D. Nichols, S. F. Chipman, and R. L. Brennan (Eds.), Cognitively Diagnostic Assessment (p. 327361). Hillsdale, NJ: Lawrence Erlbaum Associates. Tatsuoka, K. K. (1990). Toward integration of item response theory and cognitive error diagnoses. In N. Frederiksen, R. L. Glasser, A. M. Lesgold, and M. G. Shafto (Eds.), Diagnostic monitoring of skills and knowledge acquisition (p.453-486). Hillsdale, NJ: Lawrence Erlbaum Associates. Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement 20(4). Tatsuoka, K. K. (1984) Caution indices based on item response theory. Psychometrika 49(1), 95-110. Tatsuoka, K. K. and Tatsuoka, M. M. (1982). Detection of aberrant response patterns and their effect on dimensionality. Journal of Educational Statistics 7(3), 215-231. Tatsuoka, K. K. and Tatsuoka, M. M. (1984) Bug distribution and pattern classification. Psychometrika 52(2), 193-206. Tatsuoka, K. K. and Tatsuoka, M. M. (1997). Computerized cognitive diagnostic adaptive testing: Effects on remedial instruction as empirical validation. Journal of Educational Measurement 34(1), 3-20. Tatsuoka M. M. and Tatsuoka, K. K. (1989). Rule space. In S. Kots and N. L. Johnson (Eds.) Encyclopedia of Statistical Sciences, vol. 8 (p. 217-220). New York: Wiley. Thissen, D. and Mislevy, R. J. (2000). Testing algorithms. In H. Wainer (Ed.), Computerized adaptive testing: A primer (p. 101-134). Mahwah, NJ: Lawrence Earlbaum Associates. Texas Education Agency (2002), Texas Student Assessment Program Technical Digest for the Academic Year 2001-2002, Austin, TX. Retrieved February 20, 2004, from http://www.tea.state.tx.us/student.assessment/resources/techdig/index.html Tierney, L. (1997). Markov Chain Monte Carlo Algorithms. In S. Kotz, N. L. Johnson and C. B. Read, Eds.), Encyclopedia of Statistical Sciences, update 1 (p. 392399). New York, NY: John Wiley and Sons. U.S. House of Representatives (2001), Text of the No Child Left Behind Act . Public Law No. 107-110, 115 Stat. 1425. 159 van der Linden, W. & Chang, H. (2003). Implementing Content Constraints in AlphaStratified Adaptive Testing Using a Shadow Test Approach. Applied Psychological Measurement, 27, 107-120. van der Linden, W. (2000a). Optimal assembly of tests with item sets. Applied Psychological Measurement, 24, 225-240. van der Linden, W. (2000b). Constrained adaptive testing with shadow tests. In W. Van der Linden, and C.A.W. Glas (Eds.), Computerized adaptive testing: Theory and practice (p. 27-52). Dordrecht, The Netherlands: Kluwer Academic Publishers. van der Linden, W. and Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. Van der Linden, and C.A.W. Glas (Eds.), Computerized adaptive testing: Theory and practice (p. 1-25). Dordrecht, The Netherlands: Kluwer Academic Publishers. van der Linden, W. & Reese, L. (1998, September). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, 259-270. Veldkamp, B. P. and van der Linden, W. (2000). Designing item pools for computerized adaptive testing. In W. Van der Linden, and C.A.W. Glas (Eds.), Computerized adaptive testing: Theory and practice (p. 149-162). Dordrecht, The Netherlands: Kluwer Academic Publishers. Wainer, H. (2000). Introduction and history. In H. Wainer (Ed.), Computerized adaptive testing: A primer (p. 1-22). Mahwah, NJ: Lawrence Earlbaum Associates. Wainer, H. and Mislevy, R. J. (2000). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computerized adaptive testing: A primer (p. 61-100). Mahwah, NJ: Lawrence Earlbaum Associates. Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement: Issues and Practice, 17, 17-27. Whitesitt, J. E. (1995). Boolean algebra and its applications. Mineola, NY: Dover Publications. Whittaker, T. A., Fitzpatrick, S. J., William, N. J., and Dodd, B. G. (2003). IRTGEN: A SAS macro program to generate known trait scores and item responses for commonly used item response theory models. Applied Psychological Measurement, 27, 299-300. 160 Xu, X., Chang, H., & Douglas, J. (2003). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL. 161 VITA Meghan Kathleen McGlohen was born in Grapevine, Texas on August 1, 1979. She grew up in the suburbs of the Dallas-Fort Worth greater metropolitan area. In 1997, she graduated from Lamar High School in Arlington Texas and began college at Texas A&M University in College Station. She graduated from Texas A&M University in 2001 with a Bachelors of Science in Psychology. In the fall of 2001, she began her work in graduate school in a Ph.D. program studying quantitative Methods in the Department of Educational Psychology at the University of Texas at Austin, where she received a Masters of Arts degree in May 2003. Emphasis in this area of study is placed on psychometrics and statistical techniques. Upon completion of her Doctorate of Philosophy in May of 2004, she plans on moving to the Research Triangle area in North Carolina immediately following her wedding with Joshua Thomas Wills on June 25, 2004. Permanent Address: 12440 Alameda Trace Circle #2031, Austin, Texas 78727 This dissertation was typed by the author. 162

Find millions of documents here - Study Guides, Homework Solutions, Papers, Exam Answer Keys and more. Course Hero has millions of course related materials that will enable you to learn better, faster and get an A in all your courses.
Below is a small sample set of documents:

lansdellcp029.pdf
Path: Texas >> LANSDELLCP >> 029 Fall, 2009

Description: Copyright by Curtis Patrick Leon Lansdell 2002 The Dissertation Committee for Curtis Patrick Leon Lansdell certifies that this is the approved version of the following dissertation: Charged Xi Production in 130 GeV Au+Au Collisions at the Relativis...
stuberja80926.pdf
Path: Texas >> STUBERJA >> 80926 Fall, 2009
Description: ...
canterar35023.pdf
Path: Texas >> CANTERAR >> 35023 Fall, 2009
Description: Copyright by Anna Rudolph Canter 2004 The Dissertation Committee for Anna Rudolph Canter Certifies that this is the approved version of the following dissertation: \"In the Middle of an Orange Grove, Across the Street From the Tortilla Factory\": The...
chatellemb042.pdf
Path: Texas >> CHATELLEMB >> 042 Fall, 2009
Description: Copyright by Melody Beth Chatelle 2004 The Dissertation Committee for Melody Beth Chatelle certifies that this is the approved version of the following dissertation: From the Mouths of Babes: Narratives of Children and Young People with Advanced or...
shackmanlc042.pdf
Path: Texas >> SHACKMANLC >> 042 Fall, 2009
Description: Copyright by Leah Caitlin Shackman 2004 The Dissertation Committee for Leah Caitlin Shackman certies that this is the approved version of the following dissertation: Isotope Eects in Gas-Surface Interactions: Quantum-State Resolved Studies of D2 Sc...
complexity.txt
Path: CSU San Bernardino >> CS >> 330 Fall, 2009
Description: Time complexity of an algorithm: = Time complexity is a characterization of the amount of work performed by a particular algorithm in solving a problem as a function of the problem size. We assume that time to complete the algorithm is directly depe...
okazakit51686.pdf
Path: Texas >> OKAZAKIT >> 51686 Fall, 2009
Description: Copyright by Taichiro Okazaki 2004 The Dissertation Committee for Taichiro Okazaki Certifies that this is the approved version of the following dissertation: SEISMIC PERFORMANCE OF LINK-TO-COLUMN CONNECTIONS IN STEEL ECCENTRICALLY BRACED FRAMES Co...
bamfordw82161.pdf
Path: Texas >> BAMFORDW >> 82161 Fall, 2009
Description: Copyright by William Alfred Bamford Jr. 2004 The Dissertation Committee for William Alfred Bamford Jr. certifies that this is the approved version of the following dissertation: Navigation and Control of Large Satellite Formations Committee: E. G...
russellr74662.pdf
Path: Texas >> RUSSELLR >> 74662 Fall, 2009
Description: Copyright by Ryan Paul Russell 2004 The Dissertation Committee for Ryan Paul Russell certifies that this is the approved version of the following dissertation: Global Search and Optimization for Free-Return Earth-Mars Cyclers Committee: Cesar A. ...
lab9.pdf
Path: CSU San Bernardino >> CS >> 201 Fall, 2009
Description: CS201 LABORATORY WEEK 9 Winter 2009 Prof. Kerstin Voigt Work on the following exercises in the sequence indicated. Logging On. Log on with your username and password. If you experience any diculty, let the lab instructor know immediately. Insist th...
mukadama15106.pdf
Path: Texas >> MUKADAMA >> 15106 Fall, 2009
Description: Copyright by Anjum Shagufta Mukadam 2004 The Dissertation Committee for Anjum Shagufta Mukadam certies that this is the approved version of the following dissertation: Ensemble Characteristics of the ZZ Ceti stars Committee: D. E. Winget, Supervi...
kellerkm71167.pdf
Path: Texas >> KELLERKM >> 71167 Fall, 2009
Description: Copyright by Karin Mia Keller 2004 The Dissertation Committee for Karin Mia Keller Certifies that this is the approved version of the following dissertation: Biopolymer Analysis by Electrospray Ionization and Tandem Mass Spectrometry Committee: Je...
oxfordwt32223.pdf
Path: Texas >> OXFORDWT >> 32223 Fall, 2009
Description: ...
bennettl81291.pdf
Path: Texas >> BENNETTL >> 81291 Fall, 2009
Description: Copyright by Laura Sheffield Bennett 2004 The Dissertation Committee for Laura Sheffield Bennett certifies that this is the approved version of the following dissertation: The Role of Attachment in the Relationship Between Maternal and Childhood De...
engelas504835.pdf
Path: Texas >> ENGELAS >> 504835 Fall, 2009
Description: Copyright by Annette Summers Engel 2004 The Dissertation Committee for Annette Summers Engel Certifies that this is the approved version of the following dissertation: Geomicrobiology of Sulfuric Acid Speleogenesis: Microbial Diversity, Nutrient Cy...
curranma71134.pdf
Path: Texas >> CURRANMA >> 71134 Fall, 2009
Description: Copyright by Melissa Anne Curran 2004 The Dissertation Committee for Melissa Anne Curran certifies that this is the approved version of the following dissertation: How Representations of the Parental Marriage Predict Marital Quality Between Partner...
stanleyk74304.pdf
Path: Texas >> STANLEYK >> 74304 Fall, 2009
Description: Copyright by Kenneth Owen Stanley 2004 The Dissertation Committee for Kenneth Owen Stanley certifies that this is the approved version of the following dissertation: Efficient Evolution of Neural Networks through Complexification Committee: Risto...
protsenkode026.pdf
Path: Texas >> PROTSENKOD >> 026 Fall, 2009
Description: Copyright by Dmitriy Evgenievich Protsenko 2002 Electrosurgical Tissue Resection: A Numerical Study by Dmitriy Evgenievich Protsenko, MS Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial ...
Chapter07.outline.pdf
Path: Concordia NE >> PHYS >> 110 Fall, 2009
Description: 1 Chapter 7: Momentum Brent Royuk Phys-110 Concordia University 2 Linear Momentum Definition: Units Multiple Objects Take the vector sum to get the total for the system Newtons Second Law 3 Impulse Rearrange the previous equation: Example...
rutherfordg022.pdf
Path: Texas >> RUTHERFORD >> 022 Fall, 2009
Description: Copyright by Gregory Franklin Rutherford 2002 The Dissertation Committee for Gregory Franklin Rutherford Certifies that this is the approved version of the following dissertation: Academics and Economics: The Yin and Yang of For-Profit Higher Educa...
auerbachs13838.pdf
Path: Texas >> AUERBACHS >> 13838 Fall, 2009
Description: Copyright by Scott David Auerbach 2004 The Dissertation Committee for Scott David Auerbach Certifies that this is the approved version of the following dissertation: Analysis of Mutations in the Kinesin Motor That Decouple ATPase Activity and Micro...
dechapanyaw029.pdf
Path: Texas >> DECHAPANYA >> 029 Fall, 2009
Description: Copyright by Wipawee Dechapanya 2002 Kinetic and Physic Models of Secondary Organic Aerosol Formation and their Application to Houston Conditions by Wipawee Dechapanya, M.S. Dissertation Presented to the Faculty of the Graduate School of the Univ...
shoemakerdb042.pdf
Path: Texas >> SHOEMAKERD >> 042 Fall, 2009
Description: Copyright by Deanna Beth Shoemaker 2004 The Dissertation Committee for Deanna Beth Shoemaker certifies that this is the approved version of the following dissertation: QUEERS, MONSTERS, DRAG QUEENS, AND WHITENESS: UNRULY FEMININITIES IN WOMENS STAGE...
johnsonam71217.pdf
Path: Texas >> JOHNSONAM >> 71217 Fall, 2009
Description: Copyright by Ashley Michelle Johnson 2004 The Dissertation Committee for Ashley Michelle Johnson Certifies that this is the approved version of the following dissertation: Studies Toward the Development of an Electronically Switchable Ion Exchange ...
sampselld77810.pdf
Path: Texas >> SAMPSELLD >> 77810 Fall, 2009
Description: Copyright by Matthew Brian Sampsell 2004 The Dissertation Committee for Matthew Brian Sampsell certifies that this is the approved version of the following dissertation: BEAM EMISSION SPECTROSCOPY ON THE ALCATOR C-MOD TOKAMAK Committee: __ Kenneth...
complex.txt
Path: CSU San Bernardino >> CS >> 330 Fall, 2009
Description: Laboratory: Complexity Implement: 1. Towers of Hanoi (recursive algorithm described in Ch. 2 Budd) theoretically this is O(2^N) 2. A sort algorithm of your choice (see cs202 labs for sample code) (should be O(N^2) or O(NlogN) ) For...
cadenheadjk046.pdf
Path: Texas >> CADENHEADJ >> 046 Fall, 2009
Description: Copyright by Juliet Kathryn Cadenhead 2004 The Dissertation Committee for Juliet Kathryn Cadenhead Certifies that this is the approved version of the following dissertation: The Tripartite Self: Gender, Identity, and Power Committee: William Moor...
benjaminsmr042.pdf
Path: Texas >> BENJAMINSM >> 042 Fall, 2009
Description: Copyright by Maureen Reindl Benjamins 2004 The Dissertation Committee for Maureen Reindl Benjamins certifies that this is the approved version of the following dissertation: Religion and Preventive Health Care Use in Older Adults Committee: __ Rob...
simpsonal13317.pdf
Path: Texas >> SIMPSONAL >> 13317 Fall, 2009
Description: ...
hamiltont84490.pdf
Path: Texas >> HAMILTONT >> 84490 Fall, 2009
Description: Copyright by Tracy Chapman Hamilton 2004 The Dissertation Committee for Tracy Chapman Hamilton Certifies that this is the approved version of the following dissertation: Pleasure, Politics, and Piety: The Artistic Patronage of Marie de Brabant Comm...
kotrlaka518287.pdf
Path: Texas >> KOTRLAKA >> 518287 Fall, 2009
Description: Copyright by Kimberly Ann Kotrla 2004 The Dissertation Committee for Kimberly Ann Kotrla certifies that this is the approved version of the following dissertation: Prenatal Alcohol Consumption: A Risk-Protective Model Committee: _ Diana DiNitto, ...
harrisont86130.pdf
Path: Texas >> HARRISONT >> 86130 Fall, 2009
Description: Copyright by Tracie Culp Harrison 2004 The Dissertation Committee for Tracie Culp Harrison Certifies that this is the approved version of the following dissertation: The Meaning of Aging for Women with Childhood Onset Disabilities Committee: Alex...
brandonjc99738.pdf
Path: Texas >> BRANDONJC >> 99738 Fall, 2009
Description: Copyright By Jamie Chad Brandon 2004 The Dissertation Committee for Jamie Chad Brandon certifies that this is the approved version of the following dissertation Van Winkle\'s Mill: Mountain Modernity, Cultural Memory and Historical Archaeology in th...
MATH107A46024536.doc
Path: MD University College >> ASIA >> 2092 Fall, 2009
Description: University of Maryland University College MATH 107: College Algebra 3 semester credits Spring session 2: 2008/2009 Kunsan, Korea; M W 1830-2130 Faculty Contact Information: Toni Yoon, Collegiate Assistant Professor E-mail: ayoon@asia.umuc.edu Phon...
crawforda65881.pdf
Path: Texas >> CRAWFORDA >> 65881 Fall, 2009
Description: Copyright by Arthur Bryan Crawford 2004 The Dissertation Committee for Arthur Bryan Crawford Certifies that this is the approved version of the following dissertation: Evaluation of the Impact of Non-Uniform Neutron Radiation Fields on the Dose Rec...
achacosom07761.pdf
Path: Texas >> ACHACOSOM >> 07761 Fall, 2009
Description: Copyright by Michelle Valleau Achacoso 2002 The Dissertation Committee for Michelle Valleau Achacoso Certifies that this is the approved version of the following dissertation: \"WHAT DO YOU MEAN MY GRADE IS NOT AN A?\" AN INVESTIGATION OF ACADEMIC EN...
jarroldwl86380.pdf
Path: Texas >> JARROLDWL >> 86380 Fall, 2009
Description: @99 668 7 4 ( 1 0 ( % \" ! )6532$# (d1 d0 ( 27h ( 22 ( 7 0 ( ) 31 S ( )6 1 4 ( 2 0 )S ( ) ( 21 h#\" ( ( ( ! ! q $ )Q $ 4 V 4 v 4 3 I t VQq 4 ( r...
sharyginany026.pdf
Path: Texas >> SHARYGINAN >> 026 Fall, 2009
Description: 45 5 4 0\' )3 120)$\" \'% \' %# ! v r p a u s t\' # (# r 3 g \' p % # q1 i # 3 # # p i gf % # a1 d# \' h # e # d(# ` b % G ` Y D R G 9 \" ( % R P I GB \" D B...
goncalvesac026.pdf
Path: Texas >> GONCALVESA >> 026 Fall, 2009
Description: Copyright by Alexandre Casassola Gonalves c 2002 The Dissertation Committee for Alexandre Casassola Gonalves c Certies that this is the approved version of the following dissertation: An Application of The Continuity Method for an Equation on Line ...
zieglerkj47418.pdf
Path: Texas >> ZIEGLERKJ >> 47418 Fall, 2009
Description: Copyright By Kirk J. Ziegler 2001 The Dissertation Committee for Kirk Jeremy Ziegler Certifies that this is the approved version of the following dissertation: Chemical Equilibria and Nanocrystal Synthesis in High Temperature Supercritical Solution...
burtnerjc90760.pdf
Path: Texas >> BURTNERJC >> 90760 Fall, 2009
Description: Copyright by Jennifer Carol Burtner 2004 The Dissertation Committee for Jennifer Carol Burtner certifies that this is the approved version of the following dissertation: Travel and transgression in the Mundo Maya: Spaces of home and alterity in a G...
alvarezla07232.pdf
Path: Texas >> ALVAREZLA >> 07232 Fall, 2009
Description: ...
MATH012A46124534.doc
Path: MD University College >> ASIA >> 2092 Fall, 2009
Description: University of Maryland University College MATH 012 Intermediate Algebra 3 semester credits Spring Session 2 2008/2009 Kunsan: MTWTh 17:00-18:15 Faculty Contact Information: My e-mails are checked nightly. So if you have any conflict with class...
bonningew86532.pdf
Path: Texas >> BONNINGEW >> 86532 Fall, 2009
Description: Copyright by Erin Wells Bonning 2004 The Dissertation Committee for Erin Wells Bonning certifies that this is the approved version of the following dissertation: Computational and Astrophysical Studies of Black Hole Spacetimes Committee: Richard ...
CMIS141AA44024445.doc
Path: MD University College >> ASIA >> 2092 Fall, 2009
Description: Syllabus University of M a ryland University College - Asia Spring Session I, 2008-2009 (01/19 ~ 03/12) Osan Course: Credit: I nstructor: Homepage: CMIS141A 3 J in-Ah Jeon Fundamentals of Programming I I Mon. ~ Thu. E-mai l: 1145 ~ 1300 jeonj1sh@ya...
CMIS102AA42086692.doc
Path: MD University College >> ASIA >> 2088 Fall, 2009
Description: Syllabus University of M a ryland University College - Asia Fall Session I I, 2008-2009 (10/28 ~ 12/20) Osan Course: Credit: I nstructor: Homepage: Prerequisites: Textbook: CMIS102A 3 J in-Ah Jeon Fundamentals of Programming I Tue. & Thu. E-mai l: ...
STAT200A42186896.doc
Path: MD University College >> ASIA >> 2088 Fall, 2009
Description: UMUC, Asia STAT 200: Introductory Statistics 3 semester credits Fall session 2: 2008 Yongsan : T Th 1800-2100 FACULTY CONTACT INFORMATION: Assistant Professor: Antonia (Toni) Yoon E-mail:ayoon@asia.umuc.edu Phone #: (DSN) 723-4295; Leave message. ...
kulkarnis86095.pdf
Path: Texas >> KULKARNIS >> 86095 Fall, 2009
Description: Copyright by Shanti Joy Kulkarni 2004 The Dissertation Committee for Shanti Joy Kulkarni certifies that this is the approved version of the following dissertation: Adolescent mothers negotiating development in the context of interpersonal violence ...
chapmanbg60287.pdf
Path: Texas >> CHAPMANBG >> 60287 Fall, 2009
Description: ...
slattonkc78713.pdf
Path: Texas >> SLATTONKC >> 78713 Fall, 2009
Description: ...
michalskylo026.pdf
Path: Texas >> MICHALSKYL >> 026 Fall, 2009
Description: Copyright by Linda Oldfather Michalsky 2002 The Dissertation Committee for Linda Oldfather Michalsky Certifies that this is the approved version of the following dissertation: Evaluation of an Interactive Multimedia Program on Calcium and Folate Co...
batemanmt33508.pdf
Path: Texas >> BATEMANMT >> 33508 Fall, 2009
Description: ...
lodowskid97061.pdf
Path: Texas >> LODOWSKID >> 97061 Fall, 2009
Description: Copyright by David T. Lodowski 2004 The Dissertation Committee for David Thomas Lodowski Certifies that this is the approved version of the following dissertation: Structural Basis for the Regulation of GRK2 by G Committee: John Tesmer, Supervisor...
raichlend29983.pdf
Path: Texas >> RAICHLEND >> 29983 Fall, 2009
Description: Copyright by David Allan Raichlen 2004 The Dissertation Committee for David Allan Raichlen Certifies that this is the approved version of the following dissertation: The Relationship Between Limb Muscle Mass Distribution and the Mechanics and Energ...
perkinsjd44616.pdf
Path: Texas >> PERKINSJD >> 44616 Fall, 2009
Description: ...
mehdiabadinj026.pdf
Path: Texas >> MEHDIABADI >> 026 Fall, 2009
Description: Copyright by Natasha Jum Mehdiabadi 2002 The Dissertation Committee for Natasha Jum Mehdiabadi Certifies that this is the approved version of the following dissertation: ANT SYMBIOSES: COLONY-LEVEL EFFECTS OF ANTAGONISTIC AND MUTUALISTIC INTERACTION...
borisovasa86653.pdf
Path: Texas >> BORISOVASA >> 86653 Fall, 2009
Description: Copyright by Svetlana Alekseyevna Borisova 2004 The Dissertation Committee for Svetlana Alekseyevna Borisova certifies that this is the approved version of the following dissertation: Genetic and Biochemical Studies of the Biosynthesis and Attachme...
Abuhakema504399.pdf
Path: Texas >> ABUHAKEMA >> 504399 Fall, 2009
Description: Copyright by Ghazi M. A. Abuhakema 2004 The Dissertation Committee for Ghazi M. A. Abuhakema certifies that this is the approved version of the following dissertation: The Cultural Component of the Arabic Summer Program at Middlebury College: Fulfi...
hw03_solution.doc
Path: Penn State >> ME >> 581 Fall, 2009
Description: ME 581 - Spring 2008 HW03 Name _ 1) View the web cutter video \"wc.mov\" from the class web page. JPG images are provided in \"wc_images.zip\". Be certain to read the \"read_me.txt\" file within the ZIP. Use suitable software to digitize the location of...
oestreichj19588.pdf
Path: Texas >> OESTREICHJ >> 19588 Fall, 2009
Description: Copyright by Jrg Oestreich 2004 The Dissertation Committee for Jrg Oestreich Certifies that this is the approved version of the following dissertation: FROM ECOLOGY TO NEURAL MECHANISMS: A NEUROETHOLOGICAL APPROACH TO A NOVEL FORM OF MEMORY Commit...
evstatieve01477.pdf
Path: Texas >> EVSTATIEVE >> 01477 Fall, 2009
Description: Copyright by Evstati Georgiev Evstatiev 2004 The Dissertation Committee for Evstati Georgiev Evstatiev certifies that this is the approved version of the following dissertation: A Model for Multi-Wave BeamPlasma Interaction Committee: Philip J. M...
paschvaldesg042.pdf
Path: Texas >> PASCHVALDE >> 042 Fall, 2009
Description: Copyright by Grete Mara Pasch Valds 2004 Identifying, Selecting, and Organizing the Attributes of Web Resources by Grete Mara Pasch Valds, BSc, MSc, MLIS Dissertation Presented to the Faculty of the School of Information The University of Texas at...
alvaradocg86236.pdf
Path: Texas >> ALVARADOCG >> 86236 Fall, 2009
Description: Copyright by Cassandre Giguere Alvarado 2004 The Dissertation Committee for Cassandre Giguere Alvarado Certifies that this is the approved version of the following dissertation: EMIC PERSPECTIVES: THE FRESHMAN INTEREST GROUP PROGRAM AT THE UNIVERSI...
martinssonpj026.pdf
Path: Texas >> MARTINSSON >> 026 Fall, 2009
Description: The dissertation committee for Per-Gunnar Johan Martinsson certifies that this is the approved version of the following dissertation: Fast multiscale methods for lattice equations Committee: Gregory Rodin, Supervisor Ivo Babuka, Supervisor s Jer...
makowitza504694.pdf
Path: Texas >> MAKOWITZA >> 504694 Fall, 2009
Description: Copyright by Astrid Makowitz 2004 The Dissertation Committee for Astrid Makowitz Certifies that this is the approved version of the following dissertation: THE GENETIC ASSOCIATION BETWEEN BRITTLE DEFORMATION AND QUARTZ CEMENTATION: EXAMPLES FROM BU...
andersonmw81540.pdf
Path: Texas >> ANDERSONMW >> 81540 Fall, 2009
Description: Copyright by Matthew William Anderson 2004 The Dissertation Committee for Matthew William Anderson certifies that this is the approved version of the following dissertation: Constrained Evolution in Numerical Relativity Committee: Richard Matzner...
martinezrs39334.pdf
Path: Texas >> MARTINEZRS >> 39334 Fall, 2009
Description: Copyright by Rebecca Suzanne Martnez 2002 The Dissertation Committee for Rebecca Suzanne Martnez Certifies that this is the approved version of the following dissertation: A COMPARISON OF LEARNING DISABILITY SUBTYPES IN MIDDLE SCHOOL: SELF-CONCEPT, ...
elshayebta87380.pdf
Path: Texas >> ELSHAYEBTA >> 87380 Fall, 2009
Description: Copyright by Tarek Abu Serie Elshayeb 2004 The Dissertation Committee for Tarek Abu Serie Elshayeb Certifies that this is the approved version of the following dissertation: Integrated Sequence Stratigraphy, Depositional Environments, Diagenesis, a...
cowmeadowr17589.pdf
Path: Texas >> COWMEADOWR >> 17589 Fall, 2009
Description: Copyright by Roshani Barbara Cowmeadow 2004 The Dissertation Committee for Roshani Barbara Cowmeadow Certifies that this is the approved version of the following dissertation: Molecular mechanisms of alcohol tolerance in the fruit fly. Committee: ...
schougaardsb029.pdf
Path: Texas >> SCHOUGAARD >> 029 Fall, 2009
Description: Copyright by Steen Brian Schougaard 2002 The Dissertation Committee for Steen Brian Schougaard certifies that this is the approved version of the following dissertation: DEVELOPMENT AND STUDY OF HIGH-TC SUPERCONDUCTOR CONDUCTIVE POLYMER ASSEMBLIES ...

Course Hero is not sponsored or endorsed by any college or university.