22 Pages

chapter8

Course: STAT 309, Fall 2009
School: Carnegie Mellon
Rating:
 
 
 
 
 

Word Count: 7292

Document Preview

8 Chapter Threats to Your Experiment Planning to avoid criticism. One of the main goals of this book is to encourage you to think from the point of view of an experimenter, because other points of view, such as that of a reader of scientic articles or a consumer of scientic ideas, are easy to switch to after the experimenters point of view is understood, but the reverse is often not true. In other words, to...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> Carnegie Mellon >> STAT 309

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
8 Chapter Threats to Your Experiment Planning to avoid criticism. One of the main goals of this book is to encourage you to think from the point of view of an experimenter, because other points of view, such as that of a reader of scientic articles or a consumer of scientic ideas, are easy to switch to after the experimenters point of view is understood, but the reverse is often not true. In other words, to enhance the usability of what you learn, you should pretend that you are a researcher, even if that is not your ultimate goal. As a researcher, one of the key skills you should be developing is to try, in advance, to think of all of the possible criticisms of your experiment that may arise from the reviewer of an article you write or the reader of an article you publish. This chapter discusses possible complaints about internal validity, external validity, construct validity, type-1 error, and power. We are using threats to mean things that will reduce the impact of your study results on science, particularly those things that we have some control over. 193 194 CHAPTER 8. THREATS TO YOUR EXPERIMENT 8.1 Internal validity In a well-constructed experiment in its simplest form we manipulate variable X and observe the eects on variable Y. For example, outcome Y could be number of people who purchase a particular item in a store over a certain week, and X 8.1. INTERNAL VALIDITY 195 could be some characteristics of the display for that item, such as use of pictures of people of dierent status for an in-store advertisement (e.g., a celebrity vs. an unknown model). Internal validity is the degree to which we can appropriately conclude that the changes in X caused the changes in Y. The study of causality goes back thousands of years, but there has been a resurgence of interest recently. For our purposes we can dene causality as the state of nature in which an active change in one variable directly changes the probability distribution of another variable. It does not mean that a particular treatment is always followed by a particular outcome, but rather that some probability is changed, e.g. a higher outcome is more likely with a particular treatment compared to without. A few ideas about causality are worth thinking about now. First, association, which is equivalent to non-zero correlation (see section 3.6.1) in statistical terms, means that we observe that when one variable changes, another one tends to change. We cannot have causation without association, but just nding an association is not enought to justify a claim of causation. Association does not necessarily imply causation. If variables X and Y (e.g., the number of televisions (X) in various countries and the infant mortality rate (Y) of those countries) are found to be associated, then there are three basic possibilities. First X could be causing Y (televisions lead to more health awareness, which leads to better prenatal care) or Y could be causing X (high infant mortality leads to attraction of funds from richer countries, which leads to more televisions) or unknown factor Z could be causing both X and Y (higher wealth in a country leads to more televisions and more prenatal care clinics). It is worth memorizing these three cases, because they should always be considered when association is found in an observational study as opposed to a randomized experiment. (It is also possible that X and Y are related in more complicated ways including in large networks of variables with feedback loops.) Causation (X causes Y) can be logically claimed if X and Y are associated, and X precedes Y, and no plausible alternative explanations can be found, particularly those of the form X just happens to vary along with some real cause of changes in Y (called confounding). Returning to the advertisement example, one stupid thing to do is to place all of the high status pictures in only the wealthiest neighborhoods or the largest stores, 196 CHAPTER 8. THREATS TO YOUR EXPERIMENT while the low status pictures are only shown in impoverished neighborhoods or those with smaller stores. In that case a higher average number of items purchased for the stores with high status ads may be either due to the eect of socio-economic status or store size or perceived status of the ad. When more than one thing is dierent on average between the groups to be compared, the problem is called confounding and confounding is a fatal threat to internal validity. Notice that the denition of confounding mentions dierent on average. This is because it is practically impossible to have no dierences between the subjects in dierent groups (beyond the dierences in treatment). So our realistic goal is to have no dierence on average. For example if we are studying both males and females, we would like the gender ratio to be the same in each treatment group. For the store example, we want the average pre-treatment total sales to be the same in each treatment group. And we want the distance from competitors to be the same, and the socio-economic status (SES) of the neighborhood, and the racial makeup, and the age distribution of the neighborhood, etc., etc. Even worse, we want all of the unmeasured variables, both those that we thought of and those we didnt think of, to be similar in each treatment group. The sine qua non of internal validity is random assignment of treatment to experimental units (dierent stores in our ad example). Random treatment assignment (also called randomization) is usually the best way to assure that all of the potential confounding variables are equal on average (also called balanced) among the treatment groups. Non-random assignment will usually lead to either consciously or unconsciously unbalanced groups. If one or a few variables, such as gender or SES, are known to be critical factors aecting outcome, a good alternative is block randomization, in which randomization among treatments is performed separately for each level of the critical (non-manipulated) explanatory factor. This helps to assure that the level of this explanatory factor is balanced (not confounded) across the levels of the treatment variable. In current practice randomization is normally done using computerized random number generators. Ideally all subjects are identied before the experiment begins and assigned numbers from 1 to N (the total number of subjects), and then a computers random number generator is used to assign treatments to the subjects via these numbers. For block randomization this can be done separately for each block. If all subjects cannot be identied before the experiment begins, some way must be devised to assure that each subject has an equal chance of getting each treatment (if equal assignment is desired). One way to do this is as follows. If 8.1. INTERNAL VALIDITY 197 there are k levels of treatment, then collect the subjects until k (or 2k or 3k, etc) are available, then use the computer to randomly assign treatments among the available subjects. It is also acceptable to have the computer individually generate a random number from 1 to k for each subject, but it must be assured that the subject and/or researcher cannot re-run the process if they dont like the assignment. Confounding can occur because we purposefully, but stupidly, design our experiment such that two or more things dier at once, or because we assign treatments non-randomly, or because the randomization failed. As an example of designed confounding, consider the treatments drug plus psychotherapy vs. placebo for treating depression. If a dierence is found, then we will not know whether the success of the treatment is due to the drug, the psychotherapy or the combination. If no dierence is found, then that may be due to the eect of drug canceling out the eect of the psychotherapy. If the drug and the psychotherapy are known to individually help patients with depression and we really do want to study the combination, it would probably better to have a study with the three treatment arms of drug, psychotherapy, and combination (with or without the placebo), so that we could assess the specic important questions of whether drug adds a benet to psychotherapy and vice versa. As another example, consider a test of the eects of a mixed herbal supplement on memory. Again, a success tells us that something in the mix helps memory, but a follow-up trial is needed to see if all of the components are necessary. And again we have the possibility that one component would cancel another out causing a no eect outcome when one component really is helpful. But we must also consider that the mix itself is eective while the individual components are not, so this might be a good experiment. In terms of non-random assignment of treatment, this should only be done when necessary, and it should be recognized that it strongly, often fatally, harms the internal validity of the experiment. If you assign treatment in some pseudorandom way, e.g. alternating treatment levels, you or the subjects may purposely or inadvertently introduce confounding factors into your experiment. Finally, it must be stated that although randomization cannot perfectly balance all possible explanatory factors, it is the best way to attempt this, particularly for unmeasured or unimagined factors that might aect the outcome. Although there is always a small chance that important factors are out of balance after random treatment assignment (i.e., failed randomization), the degree of imbalance is generally small, and gets smaller as the sample size gets larger. 198 CHAPTER 8. THREATS TO YOUR EXPERIMENT In experiments, as opposed to observational studies, the assignment of levels of the explanatory variable to study units is under the control of the experimenter. Experiments dier from observational studies in that in an experiment at least the main explanatory variables of interest are applied to the units of observation (most commonly subjects) under the control of the experimenter. Do not be fooled into thinking that just because a lot of careful work has gone into a study, it must therefore be an experiment. In contrast to experiments, in observational studies the subjects choose which treatment they receive. For example, if we perform magnetic resonance imaging (MRI) to study the eects of string instrument playing on the size of Brocas area of the brain, this is an observational study because the natural proclivities of the subjects determine which treatment level (control or string player) each subject has. The experimenter did not control this variable. The main advantage of an experiment is that the experimenter can randomly assign treatment, thus removing nearly all of the confounding. In the absence of confounding, a statistically signicant change in the outcome provides good evidence for a causal eect of the explanatory variable(s) on the outcome. Many people consider internal validity to be not applicable to observational studies, but I think that in light of the availability of techniques to adjust for some confounding factors in observational studies, it is reasonable to discuss the internal validity of observational studies. Internal validity is the ability to make causal conclusions. The huge advantage of randomized experiments over observational studies, is that causal conclusions are a natural outcome of the former, but difcult or impossible to justify in the latter. Observational studies are always open to the possibility that the eects seen are due to confounding factors, and therefore have low internal validity. (As mentioned above, there are a variety of statistical techniques, beyond the scope of this book, which provide methods that attempt to correct for some of the confounding in observational studies.) As another example consider the eects of vitamin C on the common cold. A study that compares people who choose to take vitamin C versus those who choose not to will have many confounders and low internal validity. A 8.1. INTERNAL VALIDITY 199 study that randomly assigns vitamin C versus a placebo will have good internal validity, and in the presence of a statistically signicant dierence in the frequency of colds, a causal eect can be claimed. Note that confounding is a very specic term relating to the presence of a dierence in the average level of any explanatory variable across the treatment groups. It should not be used according to its general English meaning of something confusing. Blinding (also called masking) is another key factor in internal validity. Blinding indicates that the subjects are prevented from knowing which (level of) treatment they have received. If subjects know which treatment they are receiving and believe that it will aect the outcome, then we may be measuring the eect of the belief rather than the eect of the treatment. In psychology this is called the Hawthorne eect. In medicine it is called the placebo eect. As an example, in a test of the causal eects of acupuncture on pain relief, subjects may report reduced pain because they believe the acupuncture should be eective. Some researchers have made comparisons between acupuncture with needles placed in the correct locations versus similar but incorrect locations. When using subjects who are not experienced in acupuncture, this type of experiment has much better internal validity because patient belief is not confounding the eects of the acupuncture treatment. In general, you should attempt to prevent subjects from knowing which treatment they are receiving, if that is possible and ethical, so that you can avoid the placebo eect (prevent confounding of belief in eectiveness of treatment with the treatment itself), and ultimately prevent valid criticisms about the interval validity of your experiment. On the other hand, when blinding is not possible, you must always be open to the possibility that any eects you see are due to the subjects beliefs about the treatments. Double blinding refers to blinding the subjects and also assuring that the experimenter does not know which treatment the subject is receiving. For example, if the treatment is a pill, a placebo pill can be designed such that neither the subject nor the experimenter knows what treatment has been randomly assigned to each subject. This prevents confounding in the form of dierence in treatment application (e.g., the experimenter could subconsciously be more encouraging to subjects in one of the treatment groups) or in assessment (e.g, if there is some subjectivity in assessment, the experimenter might subconsciously give better assessment scores to subjects in one of the treatment groups). Of course, double blinding is not always possible, and when it is not used you should be open to 200 CHAPTER 8. THREATS TO YOUR EXPERIMENT the possibility that that any eects you see are due to dierences in treatment application or assessment by the experimenter. Triple blinding refers to not letting the person doing the statistical analysis know which treatment labels correspond to which actual treatments. Although rarely used, it is actually a good idea because there are several places in most analyses where there is subjective judgment involved, and a biased analyst may subconsciously make decisions that push the results toward a desired conclusion. The label triple blinding is also applied to blinding of the rater of the outcome in addition to the subjects and the experimenters (when the rater is a separate person). Besides lack of randomization and lack of blinding, omission of a control group is a cause of poor internal validity. A control group is a treatment group that represents some appropriate baseline treatment. It is hard to describe exactly what appropriate baseline treatment means, and this often requires knowledge of the subject area and good judgment. As an example, consider an experiment designed to test the eects of memory classes on short-term memory performance. If we have two treatment groups and are comparing subjects receiving two vs. ve classes, and we nd a statistically signicant dierence, then we only know that adding three classes causes a memory improvement, but not if two is better than none. In some contexts this might not be important, but in others our critics will claim that there are important unanswered causal questions that we foolishly did not attempt to answer. You should always think about using a good control group, although it is not strictly necessary to always use one. In a nutshell: It is only in blinded, randomized experiments that we can assure that the treatment precedes the outcome, and that there is little chance of confounding which would allow alternative explanations. It is these two conditions, along with statistically signicant association, which allow a claim of causality. 8.2. CONSTRUCT VALIDITY 201 8.2 Construct validity Once we have made careful operational denitions of our variables and classied their types, we still need to think about how useful they will be for testing our hypotheses. Construct validity is a characteristic of devised measurements that describes how well the measurement can stand in for the scientic concepts or constructs that are the real targets of scientic learning and inference. Construct validity addresses criticisms like you have shown that changing X causes a change in measurement Y, but I dont think you can justify the claims you make about the causal relationship between concept W and concept Z, or Y is a biased and/or unreliable measure of concept Z. The classic paper on construct validity is Construct Validity in Psychological Tests by Lee J. Cronbach and Paul E. Meehl, rst published in Psychological Bulletin, 52, 281-302 (1955). Construct validity in that article is discussed in the context of four types of validity. For the rst two, it is assumed that there is a gold standard against which we can compare the measure of interest. The simple correlation (see section 3.6.1) of a measure with the gold standard for a construct is called either concurrent validity if the gold standard is measured at the same time as the new measure to be tested or predictive validity if the gold standard is measured at some future time. Content validity is a bit ambiguous but basically refers to picking a representative sample of items on a multi-item test. Here we are mainly concerned with construct validity, and Cronbach and Meehl state that it is pertinent whenever the attribute or quality of interest is not operationally dened. That is, if we dene happiness to be the score on our happiness test, then the test is a valid measure of happiness by denition. But if we are referring to a concept without a direct operational denition, we need to consider how well our test stands in for the concept of interest. This is the construct validity. Cronbach and Meehl discuss the theoretical basis of construct validity for psychology, and this should be applicable to other social sciences. They also emphasize that there is no single measure of construct validity, because it is a complex, often judgment-laden set of criteria. 202 CHAPTER 8. THREATS TO YOUR EXPERIMENT Among other things, to assess contruct validity you should be sure that your measure correlates with other measures for which it should correlate if it is a good measure of the concept of interest. If there is a gold standard, then your measure should have a high correlation with that test, at least in the kinds of situations where you will be using it. And it should not be correlated with measures of other unrelated concepts. It is worth noting that good construct validity doesnt mean much if your measure is not also reliable. A good measure should not depend strongly on who is administering the test (called high inter-rater reliability), and repeat measurements should have a small statistical variance (called test-retest reliability). Most of what you will be learning about construct validity must be left to reading and learning in your specic eld, but a few examples are given here. In public health studies, a measure of obesity is often desired. What is needed for a valid denition? First it should be recognized that circular logic applies here: as long as a measure is in some form that we would recognize as relating to obesity (as opposed to, say, smoking), then if it is a good predictor of health outcomes we can conclude that it is a good measure of obesity by denition. The United States Center for Disease Control (CDC) has classications for obesity based on the Body Mass Index (BMI), which is a formula involving only height and weight. The BMI is a simple substitute that has reasonably good concurrent validity for more technical denitions of body fat such as percent total body fat which can be better estimated by more expensive and time consuming methods such as a buoyancy method. But even total body fat percent may be insucient because some health outcomes may be better predicted by information about amount of fat at specic locations. Beyond these problems, the CDC assigns labels (underweight, health weight, at risk of overweight, and overweight) to specic ranges of BMI values. But the cuto values, while partially based on scientic methods are also partly arbitrary. Also these cuto values and the names and number of categories have changed with time. And surely the best cuto for predicting outcomes will vary depending on the outcome, e.g., heart attack, stroke, teasing at school, or poor self-esteem. So although there is some degree of validity to these categories (e.g., as shown by dierent levels of disease for people in dierent categories and correlation 8.3. EXTERNAL VALIDITY 203 with buoyancy tests) there is also some controversy about the construct validity. Is the Stanford-Bidet IQ test a good measure of intelligence? Many gallons of ink have gone into discussion of this topic. Low variance for individuals tested multiple times shows that the test has high test-retest validity, and as the test is self-administered and objectively scored there is no issue with inter-rater reliability. There have been numerous studies showing good correlation of IQ with with various outcomes that should be correlated with intelligence as such future performance on various tests. In addition, factor analysis suggests a single underlying factor (called G for general intelligence). On the other hand, the test has been severely criticized for cultural and racial bias. And other critics claim there are multiple dimensions to intelligence, not just a single intelligence factor. In summation, the IQ test as a measure of the construct intelligence is considered by many researchers to have low construct validity. Construct validity is important because it makes us think carefully whether the measures we use really stand in well for the concepts that label them. 8.3 External validity External validity is synonymous with generalizability. When we perform an ideal experiment, we randomly choose subjects (in addition to randomly assigning treatment) from a population of interest. Examples of populations of interest are all college students, all reproductive aged women, all teenagers with type I diabetes, all 6 month old healthy Sprague-Dawley rats, all workplaces that use Microsoft Word, or all cities in the Northeast with populations over 50,000. If we randomly select our experimental units from the population such that each unit has the same chance (or with special statistical techniques, a xed but unequal chance) of ending up in our experiment, then we may appropriately claim that our results apply to that population. In many experiments, we do not truly have a random sample of the population of interest. In so-called convenience samples, e.g., as many of my classmates as I could attract with an oer of a free slice of pizza, the population these subjects represent may be quite limited. 204 CHAPTER 8. THREATS TO YOUR EXPERIMENT After you complete your experiment, you will need to write a discussion of your conclusions, and one of the key features of that discussion is your set of claims about external validity. First, you need to consider what population your experimental units truly represent. In the pizza example, your subjects may represent Humanities upperclassmen at top northeastern universities who like free food and dont mind participating in experiments. Next you will want to use your judgment (and powers of persuasion) to consider ever expanding spheres of subjects who might be similar to your subjects. For example, you could widen the population to all northeastern students, then to all US students, then to all US young adults, etc. Finally you need to use your background knowledge and judgment to make your best arguments whether or not (or to what degree) you expect your ndings to apply to these larger populations. If you cannot justify enlarging your population, then your study is likely to have little impact on scientic knowledge. If you enlarge too much, you may be severely criticized for over-generalization. Three special forms of non-generalizability (poor external validity) are worth more discussion. First is non-participation. If you randomly select subjects, e.g., through phone records, or college e-mail, then some subjects may decline to participate. You should always consider the very real possibility that the decliners are dierent in one or more ways from the participators, and thus your results do not really apply to the population of interest. A second problem is dropout, which is when subject who start a study do not complete it. Dropout can aect both internal and external validity, but the simplest form aecting external validity is when subjects who are too busy or less committed drop out only because of the length or burden of the experiment rather than in some way related to response to treatment. This type of dropout reduces the population to which generalization can be made, and in experiments such as those studying the eects of ongoing behavioral therapy on adjustment to a chronic disease, this can be a critical blow to external validity. The third special form of non-generalizability relates to the terms ecacy and eectiveness in the medical literature. Here the generalizability refers to the environment and the details of treatment application rather 8.4. MAINTAINING TYPE-1 ERROR 205 than the subjects. If a well-designed clinical trial is carried out under high controlled conditions in a tertiary medical center, and nds that drug X cures disease Y with 80% success (i.e., it has high ecacy), then we are still unsure whether we can generalize this to real clinical practice in a doctors oce (i.e, whether the treatment has high eectiveness). Even outside the medical setting, it is important to consider expanding spheres of environmental and treatment application variability. External validity (generalizability) relates to the breadth of the population we have sampled and how well we can justify extending our results to an even broader population. 8.4 Maintaining type-1 error Type-1 error is related to the statistical concept that in the real world of natural variability we cannot be certain about our conclusions from an experiment. A type-1 error is a claim that a treatment is eective, i.e., we decide to reject the null hypothesis, when that claim is actually false, i.e. the null hypothesis really is true. Obviously in any single real situation, we cannot know whether or not we have made a type-1 error: if we knew the absolute truth, we would not make the error. Equally obvious after a little thought is the idea that we cannot be making a type-1 error when we decide to retain the null hypothesis. As explained in more detail in several other chapters, statistical inference is the process of making appropriately qualied claims in the face of uncertainty. Type-1 error deals with the probabilistic validity of those claims. When we make a statement such as we reject the hypothesis that the mean outcome is the same for both the placebo and the active treatments with alpha equal to 0.05 we are claiming that the procedure we used to arrive at our conclusion only leads to false positive conclusions 5% of the time when the truth happens to be that there is no dierence in the eect of treatment on outcome. This is not at all the same as the 206 CHAPTER 8. THREATS TO YOUR EXPERIMENT claim that there is only a 5% chance that any reject the null hypothesis decision will be the wrong decision! Another example of a statistical statement is we are 95% condent that the true dierence in mean outcome between the placebo and active treatments is between 6.5 and 8.7 seconds. Again, the exact meaning of this statement is a bit tricky, but understanding that is not critical for the current discussion (but see 6.2.7 for more details). Due to the inherent uncertainties of nature we can never make denite, unqualied claims from our experiments. The best we can do is set certain limits on how often we will make certain false claims (but see the next section, on power, too). The conventional (but not logically necessary) limit on the rate of false positive results out of all experiments in which the null hypothesis really is true is 5%. The terms type-1 error, false positive rate, and alpha () are basically synonyms for this limit. Maintaining type-1 error means doing all we can to assure that the false positive rate really is set to whatever nominal level (usually 5%) we have chosen. This will be discussed much more fully in future chapters, but it basically involves choosing an appropriate statistical procedure and assuring that the assumptions of our chosen procedure are reasonably met. Part of the latter is verifying that we have chosen an appropriate model for our data (see section 6.2.2). A special case of not maintaining type-1 error is data snooping. E.g., if you perform many dierent analyses of your data, each with a nominal type-1 error rate of 5%, and then report just the one(s) with p-values less than 0.05, you are only fooling yourself and others if you think you have appropriately analyzed your experiment. As seen in the Section 13.3, this approach to data analysis results in a much larger chance of making false conclusions. Using models with broken assumptions and/or data snooping tend to result in an increased chance of making false claims in the presence of ineective treatments. 8.5. POWER 207 8.5 Power The power of an experiment refers to the probability that we will correctly conclude that the treatment caused a change in the outcome. If some particular true non-zero dierence in outcomes is caused by the active treatment, and you have low power to detect that dierence, you will probably make a type-2 error (have a false negative result) in which you conclude that the treatment was ineective, when it really was eective. The type-2 error rate, often called beta (), is the fraction of the time that a conclusion of no eect will be made (over repeated similar experiments) when some true non-zero eect is really present. The power is equal to 1 . Before the experiment is performed, you have some control over the power of your experiment, so you should estimate the power for various reasonable eect sizes and, whenever possible, adjust your experiment to achieve reasonable power (e.g., at least 80%). If you perform an experiment with low power, you are just wasting time and money! See Chapter 12 for details on how to calculate and increase the power of an experiment. The power of a planned experiment is the chance of getting a statistically signicant result when a particular real treatment eect exists. Studying sucient numbers of subjects is the most well known way to assure sucient power. In addition to sample size, the main (partially) controllable experimental characteristic that aects power is variability. If you can reduce variability, you can increase power. Therefore it is worthwhile to have a mnemonic device for helping you categorize and think about the sources of variation. One reasonable categorization is this: Measurement Environmental Treatment application Subject-to-subject 208 CHAPTER 8. THREATS TO YOUR EXPERIMENT (If you are a New York baseball fan, you can remember the acronym METS.) It is not at all important to correctly categorize a particular source of variation. What is important is to be able to generate a list of the sources of variation in your (or someone elses) experiment so that you can think about whether you are able (and willing) to reduce each source of variation in order to improve the power of your experiment. Measurement variation refers to dierences in repeat measurement values when they should be the same. (Sometimes repeat measurements should change, for example the diameter of a balloon with a small hole in it in an experiment of air leakage.) Measurement variability is usually quantied as the standard deviation of many measurements of the same thing. The term precision applies here, though technically precision is 1/variance. So a high precision implies a low variance (and thus standard deviation). It is worth knowing that a simple and usually a cheap way to improve measurement precision is to make repeated measurements and take the mean; this mean is less variable than an individual measurement. Another inexpensive way to improve precision, which should almost always be used, is to have good explicit procedures for making the measurement and good training and practice for whoever is making the measurements. Other than possibly increased cost and/or experimenter time, there is no down-side to improving measurement precision, so it is an excellent way to improve power. Controlling environmental variation is another way to reduce the variability of measurements, and thus increase power. For each experiment you should consider what aspects of the environment (broadly dened) can and should be controlled (xed or reduced in variation) to reduce variation in the outcome measurement. For example, if we want to look at the eects of a hormone treatment on rat weight gain, controlling the diet, the amount of exercise, and the amount of social interaction (such as ghting) will reduce the variation of the nal weight measurements, making any dierences in weight gain due to the hormone easier to see. Other examples of environmental sources of variation include temperature, humidity, background noise, lighting conditions, etc. As opposed to reducing measurement variation, there is often a down-side to reducing environmental variation. There is usually a trade-o between reducing environmental variation which increases power but may reduce external validity (see above). The trade-o between power and external validity also applies to treatment application variation. While some people include this in environmental variation, I think it is worth separating out because otherwise many people forget that it 8.5. POWER 209 is something that can be controlled in their experiment. Treatment application variability is dierences in the quality or quantity of treatment among subjects assigned to the same (nominal) treatment. A simple example is when one treatment group gets, say 100 mg of a drug. If two drug manufacturers have dierent production quality such that all of the pills from the rst manufacturer have a mean of 100 mg and s.d. of 5 mg, while the second has a mean of 100 mg and s.d. of 20 mg, the increased variability of the second manufacturer will result in decreased power to detect any true dierenc...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Minnesota - PHYS - 104
Purdue - BIOCHEM - 221
Bchm 221Important EquationsEquilibrium concentration of products Equilibrium concentration of reactantsKeq =Henderson-Hasselbalch Equation:pH = pK a + log [conjugate base] [ weak acid ]proton acceptor proton donorIonic Strength Equation:
Minnesota - A - 5022
ContentsI Early Universe and the Thermal History 22 2 3 4 4 4 6 6 6 7 7 7 81 Introduction 1.1 Phase transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Thermal History . . . . . . . . . . . . . . . . . . .
Minnesota - A - 5022
Ast/Phys 5022 Fall 2008 Problem Set #1 (due Tue Sep 16) 1. The divergence of uid stress-energy tensor is zero in the absence of external forces: T = T = 0. x x =0,3 (In the rst expression the Einstein summation convention, over the repeated inde
Minnesota - A - 5022
ContentsI Ination 22 2 3 3 3 5 6 6 6 7 8 9 9 10 11 12 13 13 14 151 Length scales and horizons 1.1 Particle and event horizons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Hubble scale/horizon/length . . . . . . . . . . . .
Minnesota - A - 5022
Ast/Phys 5022 Fall 2008 Problem Set #5 (due Nov 25/Dec 1) 1. The speed of sound changes during recombination. Because of that, the Jeans mass in baryons (mass corresponding to the Jeans length) changes dramatically as well. Write down expressions, an
Minnesota - A - 5022
ContentsI Global Cosmology 22 2 4 4 4 5 6 7 8 9 10 10 10 12 12 12 131 Geometry 1.1 The metric: spatial part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The full metric . . . . . . . . . . . . . . . . . . . . . . . . .
Minnesota - A - 5022
ContentsI Relativity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Minnesota - A - 5022
Ast/Phys 5022 Fall 2008 Problem Set #2 (due Sept 23) 1. An object is moving with velocity u in frame S. Frame S is moving with respect to S with v in the +x-direction. (a) Write down the three spatial components of vector u (i.e. what is seen by obse
Minnesota - A - 5022
Syllabus AST/PHYS 5022, Cosmology, Fall 2008 http:/www.astro.umn.edu/llrw/a5022 f08.html Instructor: Liliya L.R. Williams Relativity: Special; Lorentz transformations, time dilation and length contraction, causality, simultaneity, four-vectors, stres
Virginia Tech - ETD - 05122005
BEHAVIORAL INHIBITION/ACTIVATION AND AUTONOMIC CONTROL OF THE HEART: EXTENDING THE AUTONOMIC FLEXIBILITY MODELIsrael C. ChristieDissertation submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillmen
Minnesota - PURD - 0033
Sand Tiger Shark Behavior John Purdy 2721577 purd0033@umn.eduI chose to further observe the sand tiger shark because of it's seemingly aggressive appearance. The shark has a very streamlined body that looks makes it look like it is a very skilled a
Minnesota - SMITH - 213
_FIELD TRIPSSo far this year, we have two field trips tentatively scheduled for July and August. A two-day trip to the North shore may take place July 16-17, with Richard Ojakangas (U of M retired) as our leader. A geological walking tour of Sain
Carnegie Mellon - MATH - 21301
c'P fk!eB DEduTPBGpBhE' !(PvR(% 8' 2 i G 8 P 4 C 8' 8' C G P G 8F ' 8' 4 C iP g 8' P " c F Ea@Eh&!Pbz&S!7T@EC(vRTDE5$HSC 8' X P 8' C iP' % % 8 P' G F C P C' E!A(sEQ(EawTb(a@zRq&B'H(Hs!h(d&RX F 8 C'C P 8 8'
Minnesota - CHEM - 4101
Chem 4101 Fall 2008Lecture 33 Nov 17Gas Chromatography (GC)1- Principles of GC and definitions 2- Instrumentation Part I-Chapter 27Chem 4101 Fall 2008Lecture 33 Nov 17Gas ChromatographySection 27A, Figure 271-17Chem 4101 Fall 20
Oregon - MEDIA - 65603
Action Verbs - 11 Skill FamiliesAchievementAccelerated Accomplished Achieved Acquired Advanced Assured Attained Augmented Bolstered Completed Contributed Doubled Edited Effected Eliminated Encouraged Enhanced Established Exceeded Expanded Facilitat
Oregon - SOC - 310
Sociology 310 Winter 2009 Take-Home Assignment The goal of this assignment is to demonstrate both familiarity with the main ideas of Marx, Durkheim, and Weber, and also to show your ability to creatively apply their distinctive ways of theorizing to
LSU - PHYS - 2101
Chapter 13 Questions1. A large rock and a small pebble are held at the same height above the ground. (a) Is the gravitational force exerted on the rock greater than, less than, or equal to that exerted on the pebble? Justify your answer. (b) When th
LSU - PHYS - 2101
PROBLEMS CH. 1-8: Problem 1 (ch2-43): (a) With what speed must a ball be thrown vertically from ground to rise to a maximum height of 35 m? (b) How long will it be in the air? (c) Draw graphs: y vs t, v vs t, and a vs t. Problem 2 (ch6-15): Blocks A
Minnesota - PSY - 1001
1. Lorin believes that all computer majors are "nerds" who only think about computers. He believes they lack social skills, and that they have a weird sense of humor. In this case, Lorin's beliefs about the traits and behaviors of computer majors are
Oregon - MATH - 647
Exercises on chapter 4Always R-algebra means associative, unital R-algebra. (There are other sorts of R-algebra but we wont meet them in this course.) 1. Let A and B be algebras over a eld F . (i) Explain how to make the vector space A F B into an F
Oregon - MATH - 681
CHAPTER 2Chevalley groups1. The main construction Now well assume Vk = k Z VZ is the reduction modulo p of some faithful nite dimensional g-module via some choice of admissible lattice. We wish to study automorphisms of Vk of the form x (t) := exp
Oregon - MATH - 647
Chapter 1GroupsIn this chapter well cover pretty much all of group theory. This material is roughly the same as Rotmans chapters 2 and 5, but beware there are some extra things not in Rotman. You should *know* the material in Rotman chapter 2 well
Carnegie Mellon - MATH - 21228
E | 0 Sc E b c b 4 h b 4 c g 4 b c 8 ) B c c b c 8 ) B b c b c g b b c g ) & B R 3 Y ! B ) & & 8 ) f R v ) & & ) o 8 A p b t"(9XSa4%w|C"(9("SkH"(9(Cve'2D # 8 ) B ! A v ! & $G$CD X
Oregon - CH - 228
Molecular ModelsPart A (Completed Worksheets) Part B Part C /70 /9 /12TA initials Subjective points / Style Score/4 /5 /100
Virginia Tech - ETD - 08022002
DETERMINING DEMAND FOR HELP-WANTED ADVERTISINGby Mary T. Sherrer Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree ofMASTER OF ARTS in Economics APP
Oregon - INTL - 442
Carnegie Mellon - PHYSICS - 33131
Matter & Interactions I: 33131 Exam, 26 September, 2008Name (print) Last, First Section:Fall 2008Name (sign)This exam is 8 pages long. If you are missing a page, please contact the invigilator (proctor). There are 100 points possible in this e
Carnegie Mellon - PHYSICS - 33331
Physical Mechanics I: 33331 Exam, 24 October, 2006Name (print) Last, FirstFall 2006Name (sign)This exam is 2 pages long. If you are missing a page, please contact the invigilator (proctor). There are 150 points possible in this exam. The value
Oregon - MATH - 392
Oregon - MATH - 392
Oregon - MATH - 231
Fall term, 1066Discrete Mathematics I Midterm Name:12345TOT.Answer ALL questions. Each question is worth THREE points. Show all your work and show your working even if you give the correct answer you will not get full marks without i
Oregon - MATH - 232
Oregon - MATH - 231
Fall 1999Discrete Mathematics I FinalName:12345678TOT.Answer ALL questions. Each question is worth FIVE points. Justify all your answers carefully and show your work!11. (a) What does it mean to say an integer m divides an
Oregon - MATH - 231
Oregon - MATH - 251
Winter 2007Calculus I Practise MidtermName:1234TOT.Answer ALL questions. Each question is worth TEN points. Show all your work and try to justify your answers whenever possible that way I can give some credit even for wrong answers.
Oregon - MATH - 391
Fall 2007Elementary Abstract Algebra I Practise Final Name:12345678TOT.FINAL EXAM: 15:1517:05 THURSDAY OF FINALS WEEK. The real nal will look roughly like this, probably slightly shorter questions, but similar topics. Sections t
Minnesota - MEREV - 001
Solutions to Homework 11FM 5021 Mathematical Theory Applied to Finance14.4. A currency is currently worth $0.80. Over each of the next 2 months it is expected to increase or decrease in value by 2%. The domestic and foreign risk-free interest rates
Purdue - ECE - 103
Mic TechniquesA Shure Educational PublicationMicrophone Techniques for Live Sound ReinforcementSound ReinforcementMicIndexfor Live Sound ReinforcementT echniquesINTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Johns Hopkins - V - 110
2006Book ReviewsEveryone knows how to open it But no one knows how to close it. What is it?145Santa Fe, New MexicoCynthia GreenThe Alabados of New Mexico. Translated and edited by Thomas J. Steele, S. J. (Albuquerque: University of New Mex
Carnegie Mellon - MATH - 21701
21701 HW 05 solutions 1. [3] (a) By Chebyshevs inequality we have: Pr(X = 0) Pr(|X EX| EX) Var(X)/(EX)2 . (b) IndeedVar(X) = E(X EX)2 = E(i(Xi EXi )2 E(Xi EXi )(Xj EXj )=i j=i jCov(Xi , Xj ) Cov(Xi , Xj )i=jij=where Cov(X, Y
Carnegie Mellon - MATH - 122
Math 122, Fall 2008. Answers to Unit Test 3 Review Problems Set A.Brief Answers. (These answers are provided to give you something to check your answers against. Remember than on an exam, you will have to provide evidence to support your answers an
Carnegie Mellon - CS - 290895
<MakerFile 3.0F> # #Aaa#K#i#d#=#>#=#E#?#G#@#I####$##d#$# # ##d#ft,footnote text# TableFootnote#*#*#. #/ - #)#]# #! #^#J#a#G#q#Bp# #p#*#! #~#J#/=#{#g:#Q#O#rU #+#-z#5;F#M#rC#Z#we##+# #W#0# )#vr#M#]#N#M# xw#+35003# #W=#w#N#! #{#8]# %#2A#t#
Carnegie Mellon - CS - 290895
<MakerFile 4.0K> # #Aaa##9#3####@#d#$#Z#Z#N#N# #ff#@#d#A#ft,f ootnote text#TableFootnote#*#*#. #/ - &#:;,.!? {#pc22h33#1#%#^#%# #^#LOF5#Figure#IX#Index#2 |#TOC##HeadingR#Heading,page#Heading1#Heading2#Heading3#Heading4#Heading5#h1#h1,h
Carnegie Mellon - STAT - 462
Change of Variables Multiplicative Growth Critical Fluctuations ReferencesChaos, Complexity, and Inference (36-462)Lecture 14 Cosma Shalizi28 February 200836-462Lecture 14Change of Variables Multiplicative Growth Critical Fluctuations Refe
Carnegie Mellon - EE - 760
(Lec 8) Multilevel Min. II: Cube/Cokernel ExtractWhat you know2-level minimization a la ESPRESSO Boolean network model - lets us manipulate multi-level structure Algebraic model - simplified model of Boolean eqns lets us factor stuff Algebraic divi
Oregon - ECON - 101
MACROECONOMICSI. MACROECONOMICS: AN INTRO TO THE BIG PARADE A. Some Important Introductory Comments 1. Basic differences between macro and micro. - Macroeconomics is the branch of economics that deals with the economy as a whole micro - P & Q in on
Oregon - MATH - 111
Chris Phan Math 111Week 9 15 March 2004Read: Sections 5.4A, 5.5 Exercises: Due Monday, 1 March 2004: Section 5.1 (p. 351): 2, 4, 10, 15, 20, 25, 40, 44, 45, 46 Section 5.2A (p. 359): 4, 7, 18, 22, 26 Due Thursday1 , 4 March 2004: Section 5.3
LSU - PHYS - 2102
baPhysics 2102 Spring 2009 Course ScheduleWEEK1 2DATEJan 12-16 Jan 19-23HRW Chapter/Lecture TopicsCh 21: Electric Charge, Ch 22:Electric Fields M: 21.1-4 W: 21.4-6 F: 22.1-5 Ch 22: Electric Fields, Ch 23: Gauss' Law M: no class W: 22.6,8
Oregon - PPPM - 613
Administrative Services Financial Services PurchasingCity of Eugene 860 West Park, Suite 300 Eugene, Oregon 97401 (541) 682-5055 (541) 682-6233 FAXDate: To: From: Subject:May 6, 2002 All Interested Parties Carol K. Pomes, Purchasing Manager Dow
Carnegie Mellon - STAT - 309
Chapter 5 Learning SPSS: Data and EDAAn introduction to SPSS with emphasis on EDA.SPSS is a perfectly adequate tool for entering data, creating new variables, performing EDA, and performing formal statistical analyses. I dont have any special endo
Minnesota - CLA - 2111
Purdue - AAE - 450
To: Steve Schneider <steves@ecn.purdue.edu>Cc: rubright@ecn.purdue.edu, cs@ecn.purdue.eduReply-To: info-nt@ecn.purdue.eduSubject: Re: fortran help Index: info-ntDate: Thu, 13 Jul 2000 15:07:39 ESTFrom: Rex Bontrager <rb@ecn.purdue.edu> F
Carnegie Mellon - EE - 551
TMS320C6000 Assembly Language Tools Users GuideLiterature Number: SPRU186E February 1999Printed on Recycled PaperIMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue
Oregon - GEOG - 607
JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATIONAUGUST AMERICAN WATER RESOURCES ASSOCIATION 2005GEOMORPHOLOGY OF STEEPLAND HEADWATERS: THE TRANSITION FROM HILLSLOPES TO CHANNELS1Lee Benda, Marwan A. Hassan, Michael Church, and Christine L. Ma
Oregon - GEOG - 607
JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATIONAUGUST AMERICAN WATER RESOURCES ASSOCIATION 2005SEDIMENT TRANSPORT AND CHANNEL MORPHOLOGY OF SMALL, FORESTED STREAMS1Marwan A. Hassan, Michael Church, Thomas E. Lisle, Francesco Brardinoni, Lee
Oregon - GEOG - 607
PICOTE/SPH April 22, 2008 14:40 Char Count= 0JWBK179-099Spatial identification of tributary impacts in river networksChristian E. Torgersen1 , Robert E. Gresswell2 , Douglas S. Bateman3 and Kelly M. Burnett41US Geological Survey, Forest an
Oregon - GEOG - 607
ARTICLE IN PRESSQuaternary Science Reviews xxx (2008) 116Contents lists available at ScienceDirectQuaternary Science Reviewsjournal homepage: www.elsevier.com/locate/quascirevPost-re geomorphic response in steep, forested landscapes: Oregon C
Oregon - GEOG - 607
CATENA SUPPLEMENT 23p. 101-124Cremlingen 1992The Problem of Channel Erosion into BedrockMA. Seidl & W.E. DietrichSummary Although river incision into the bedrock of uplifted regions creates the dissected topography of landscapes, little is k