38 Pages

part1

Course: EPI 521, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 8546

Document Preview

521, EP Spring 2007, Vol I, Part 1 1 Statistical Methods in Epidemiologic Research (FOR CLASS USE ONLY DO NOT CITE OR REPRODUCE) EP 521 Spring 2007 Course Notes Vol I (Part 1 of 5) A. Russell Localio*, and Jesse A Berlin (The Great Master) *Department of Biostatistics and Epidemiology Center for Clinical Epidemiology and Biostatistics School of Medicine University of Pennsylvania Philadelphia PA 19104-6021...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> EPI 521

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
521, EP Spring 2007, Vol I, Part 1 1 Statistical Methods in Epidemiologic Research (FOR CLASS USE ONLY DO NOT CITE OR REPRODUCE) EP 521 Spring 2007 Course Notes Vol I (Part 1 of 5) A. Russell Localio*, and Jesse A Berlin (The Great Master) *Department of Biostatistics and Epidemiology Center for Clinical Epidemiology and Biostatistics School of Medicine University of Pennsylvania Philadelphia PA 19104-6021 Statistical Science, Biometrics and Clinical Informatics (BCI). J&J Pharmaceutical Research and Development, LLC 1125 Trenton-Harbourton Road PO Box 200 Titusville, NJ 08560 Overview Some basic principles Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 2 Principle #1 --Perspective This course is about analyzing data -- the issues and methods. It is not about learning rules and applying them to data. Changing Perspectives From Religion to Science After 17 years of interacting with physicians, I have come to realize that many of them are adherents of a religion they call Statistics. ... To the physician who practices this religion, Statistics refers to the seeking out and interpreting of p-values. Like any good religion, it involves vague mysteries capable of contradictory and irrational interpretation. It has a priesthood and a class of mendicant friars. And it provides Salvation: Proper invocation of the religious dogmas of Statistics will result in publication in prestigious journals. Salsburg DS. The religion of statistics as practiced in medical journals. Am Statistician 1985;39:220-223. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 3 But this course is also about learning where to find answers rather than just learning what the teacher thinks is best: We should not elevate our autonomy as individual faculty members above every other value. [Professors have a responsibility] to resist the allure of certitude, the temptation to use the podium as an ideological platform, to indoctrinate a captive audience, to play favorites with the likeminded and silence the others. Arenson KW. Columbia chief takes disputes over professors. The New York Times. (March 24, 2005) (Quoting Columbia President Lee C. Bollinger) . Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 4 Principle #2. --Common themes: We relate concepts of epidemiology (and we must be careful about taxonomy) to elements of probability to see some simple methods of analyzing data. We distinguish terms carefully: odds ratio, relative risk, rate ratio, hazard ratio. Each has a mathematical interpretation. They are not generally interchangeable. We try to look at data first and only then implement the statistical tools Interpretation of computer output is not only important, it is essential. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 5 Principle #3. Applying principles of statistics (from EP 520) to epidemiologic research Principle #4. Common statistical methods include: a) descriptive data (Means, medians, plots, tables of counts and percentages) Looking at the data b) Statistical test, e.g., t-test comparing cholesterol levels in CHD patients and controls (de-emphasized) c) comparisons of groups, in general. [treated vs. control, exposed vs. unexposed] d) Estimating effect size -- A measure of the effect of treatment or exposure on outcome expressed in any number of different metrics (odds ratios, relative risks, risk differences, differences in means, area under the ROC curve all of which are related. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 6 Principle #5. Critically important issues - Sampling Variability and power Sampling Variability and Power Differences between each set of distributions = 12 SD = 4 on left pairs; 2 on right pairs .2 0 .0 5 y .1 What does the figure suggest? The differences in the two sets of samples are about the same (the difference between the sets of peaks). But peaks in right set have less overlap because they are higher and steeper. Why? Because the variance is smaller. .1 5 0 12 24 36 48 60 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 7 - Probability distributions (~N(mean, var), or ~B(n,p) ) - Null hypothesis (hypothesis testing) - Confidence interval (meaning and construction) Definition of 95% CI (from frequentist perspective)? (See Woodward, Second Edition p 34) In 1000 repeated samples taken from a population that has a true but unobserved parameter (such as an odds ratio), the estimated 95% confidence interval from each sample should cover the true parameter in 950 of the samples. - Regression ( and the idea of modeling) - All these share underlying goals: Distinguish Signal from noise Systematic differences from random variation Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 8 Principle #6. Biases in generating data -- Observed sources of bias we can handle by: Exclusion criteria Separate analyses for each group Stratified analyses (Mantel Haenszel methods represent an example) Regression (a statistical model) Unobserved sources: Must undertake sensitivity analysis Ask what would be effect of an omitted confounder that has a stipulated Association with the outcome Association with the exposure of interest How strong must be those associations to make the observed association go away Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 9 Principle #7. Central limit theorem (CLT) is central to our course material for EP 520 and 521 It says that: For largenumbers of subjects, we can use the normal distribution to compare proportions, conduct tests of epidemiologic parameters, and create 95% confidence intervals. The CLT lets us use 2 and Z tests (t tests). (Ref: Woodward p. 72) How large is large depends on : outcome (continuous, binary, counts, modes), number of covariates (exposure + other potential confounders) form of the model (what is treated as fixed) assumptions (more assumptions make model easier to fit but less generalizable) EP When we speak (or write) about large sample methods or symptoticmethods, we are referring in general to this concept related to the central limit theorem. As the sample gets large, the statistics we estimate conform to known distributions and we can use the form of those known distributions (normal, chisquare) to make inferences about variance and to estimate confidence intervals. If we cannot invoke large sample theory, then we must use exact methods. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 10 Log transformations of odds ratios (OR) and relative risk (RR) We transform some measures to enable us to use the central limit theorem to greater effectiveness. Here is an example of why we use log transformations (and why Stata does the same) in estimating confidence intervals. Assume a population of 10,000 patients with outcome y and exposure x. X= Y 0 1 0 1 Total 2,500 2,500 5,000 2,500 2,500 5,000 10,00 0 Total 5,000 5,000 What is the true odds ratio? The resulting estimate = 1.0 Now, what happens if we draw samples of size 100 from this population and estimate the Ors? . cs y x, or. First random sample -- OR = 0.871 For the second -- OR = 0.714. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 11 Each sample will have a different OR. What will be the distribution of these OR s? Will we find a symmetric distribution around the true value of 1.0? The distribution is not symmetric about 1.0. We would expect this distribution because an OR = 3 when inverted becomes an OR = 0.33. Each is about equally unlikely to occur upon repeated sampling. To achieve a symmetric distribution, we estimate ln(OR), which should be centered at 0.0 (ln(1.0) = 0.0). 999 Samples of 100 1.5 Density 0 0 .5 1 1 2 Odds Ratio 3 4 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 12 Principle #8. Exact methods vs asymptotic (large sample) methods To avoid having to make assumptions about normal distributions and large samples, there are a set of methods called exact methods or permutation test based or assumption-free methods that do not rely on these assumptions. They work by taking the observed data and calculating a test statistic, such as a chi-square value, but then instead of comparing that test statistic to a chi-square distribution (or other relevant distribution), they take another course. These methods permute (rearrange) the data in all possible ways. With each permutation, the same test statistic is calculated. This exercise results in an entire distribution of test statistics that would occur under all possible rearrangements of the data. One can obtain a p-value by examining where the test statistic from the observed data appears along this distribution. If it appears on one end or the other of the actual distribution, then there are few values of the test statistic that are the same or more extreme than the value for the observed data. This situation equates to a small p-value Confidence intervals can also be obtained, in a very computer intensive manner But exact methods might not always be appropriate or the best estimates . Reference: Good P. Permutation tests. Second Edition. New York: Springer-Verlag 2001 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 13 Principle #9. Frequentist (vs Bayesian) methods These are the subject of this course. They rely on the principle that we could in theory conduct an experiment many times, or we would in an observational setting draw many samples. And each of these samples would differ from the underlying population, and so each estimate differs from the true value, but the goal is to estimate the population parameter whether it be odds ratio (OR), relative risk (RR), hazard ratio (HR), mean, and that true population parameter is assumed to be fixed. The accuracy of a analysis is always evaluated in terms of its performance with repetition. We do not engage in Bayesian inference, although much of our thinking about results takes on Bayesian principles. Bayesians assume that the true population parameter has a distribution. Before the data are collected and analyzed, this distribution is called the prior. After collecting the data, this prior is adjusted to estimate the posterior distribution of the parameter. So, for example, one enters the analysis with a prior estimate of OR=1.0, and then after the data, one adjusts this estimate and its distribution into a posterior based on what the data show. Reference: Moye LA. Statistical Reasoning in Medicine. The Intuitive p-value primer. New York: Springer-Verlag; 2000 (chapter 10) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 14 1. Basic Concepts of epidemiology and biostatistics (mostly review --Woodward Chap 1) 1.1 Measures of outcome: Know the taxonomy You must say what you mean and mean what you say. This is a fault of many manuscripts. Risk and rate are distinguished by denominators Risk = Pr [develop disease] for a subject (cumulative incidence) Bounded 0, ... 1. Assumes a closed population Rate = Number developing disease per unit time -- incidence rate (incidence proportion- Rothman, p. 37) Units = person-time Unbounded 0,..., (Woodward pp 117, 152) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 15 Example of risk (cumulative incidence) and incidence: Incidence of CHD in 12 yrs. (men) e.g., from Framingham ( Schless. p.29) N No. CHD at Risk Events 789 40 742 656 88 130 Cum. Incidence (Risk) .051 .119 .198 Person Yrs Observation 9228 8376 7092 Inc.Rate/ 1000 PY 4.3 10.5 18.3 Age 30 - 39 40 - 49 50 - 59 NOTE: Risk = (e.g.) 40/789 = .051 Rate = [40/9228] x 1000 = 4.3/1000 PY Person years means person years at risk. So, must substract the dropouts from the person years. (Also called the exposure). 9228 789 x 12 = 9468. Each subject who developed CHD not at riskfor entire 12 years. For simplicity, in the example, assumed that those who developed CHD were at risk for 6 years. [In general, we would calculate follow-up Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 16 time using actual individual contributions.] 9228 = (40*6) + (789-40) * 12= 240 +8988 Notes: 1. Assumes constant risk over interval 30 - 39, although risk increases with age. 2. Generally, a 36-year-old man followed for 12 yrs would (should) contribute 4 years to 30 -39 category, and the next 8 to 40 - 49. So, we must distinguish clearly between risk and rate in (1) definition (2) estimation (3) reporting to the reader Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 17 Analysis using "ci" and "cii" commands in STATA for confidence interval: First, incidence rate (per person time). To get estimates and confidence intervals of rates from the Framingham data, we could use Stata as follows . ci events, exposure (expos) by (agecat) Where expos is a variable that contains the person time -> agecat=30-39 -Poisson Exact -Variable | Exposure events | 9228 -> agecat=40-49 -Poisson Exact -Variable | Exposure events | 8376 -> agecat=50-59 -Poisson Exact -Variable | Exposure events | 7092 Mean .0183305 Std. Err. .0016077 [95% Conf. Interval] .0153151 .0217659 Mean .0105062 Std. Err. .00112 [95% Conf. Interval] .0084263 .0129439 Mean .0043346 Std. Err. .0006854 [95% Conf. Interval] .0030976 .0059025 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 18 Now cumulative incidence (risk): persons with events, per person (Using STATA immediate commands = cii) . cii 789 40 Variable | | . cii 742 88 Variable | | Obs 742 Mean .1185984 Std. Err. .0118693 -- Binomial Exact -[95% Conf. Interval] .0962128 .1440643 Obs 789 Mean .0506971 Std. Err. .0078101 -- Binomial Exact -[95% Conf. Interval] .0364633 .068398 . cii 656 130 Variable | | Obs 656 Mean .1981707 Std. Err. .0155636 -- Binomial Exact -[95% Conf. Interval] .1683193 .2307646 These are basic tools for computing the simple results for the start of your Results section of your manuscript. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 19 [Aside Simplifying use of Stata For version 9 Many of these commands, such as ci or cii are automated in menus: To get to them, look at the top tool bar find statistics and then from the column of items that come up under statistics look for observational/epi analysis. tables for epidemiologists cohort-study, risk ratio, etc, calculator or case-control Then, fill in the desired table Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 20 Note: Students want to know how to obtain p-values easily once they have a test statistic. A useful add on is quest. This is easily obtained from www.stata.com/quest. It is easily to install within Stata It has statistical tables for computing p-values, for example, for Normal, Chisq distributions Once installed in Version 9, you invoke it by quest on. Then you can use it by going to the toolbar: user/stataquest/calculator/statistical tables] End Aside ] Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 21 Risk and Odds: Comparing groups with different exposures (treatments): True population D+ D- E+ A C EB D M1 = A + B M0 = C + D N1 = A + C N0 = B + D Risk = cumulative incidence: R1 = A/N1 R0 = B/N0 Subscript 1 means exposed or diseased 0 means not exposed or not diseased R A/ N1 Relative Risk: RR = 1 = R0 B/ N 0 RR > 1 means exxposed at greater risk RR < 1 means exposed at lower risk (treated are protected) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 22 A A (A + C) ( A + C) Odds of disease : Among exposed: = R1 /(1- R1 ) = = = A ( A + C A) C 1 (A + C) (A + C) Disease Odds ratio (OR): R /(1- R1 ) A/C AD OR = 1 = = also called the cross product ratio R0 /(1- R0 ) B/D BC OR is important because: 1. For rare diseases, OR and relative risk (RR) and very similar. 2. Exposure OR = Disease OR (via Bayes Rule) in case control studies (We will return to both of these concepts later.) 3. OR is obtained directly from multivariable logistic regression 4. May be estimated from followup or case control study (or even a cross sectional study) 5. IRR (incidence rate ratio). OR is a ratio of incidence rates in density sampling case control design (Rothman, pp. 94-95). Exposure distribution among controls is the same as it is among the persons in the same source population. So, OR = ratio of incidence rates or an IRR (Recall from EP 510) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania A C EP 521, Spring 2007, Vol I, Part 1 23 We have discussed some simple definitions for event rates: Risk, rate, odds, probability Then we touched briefly on some definitions of effect size Absolute change in risk - risk difference (difference in probabilities), differences in means Relative change in risk relative risk (risk ratio), odds ratio, incidence rate ratio But how large is large when we consider the effect of treatment/exposure? Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 . cs D E [fw=count], or | E | | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 80 40 | 120 Noncases | 40 80 | 120 -----------------+------------------------+-----------Total | 120 120 | 240 | | Risk | .6666667 .3333333 | .5 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Risk difference | .3333333 | .2140537 .4526129 Risk ratio | 2 | 1.507196 2.653935 Attr. frac. ex. | .5 | .3365161 .6232011 Attr. frac. pop | .3333333 | Odds ratio | 4 | 2.342303 6.830891 (Cornfield) +------------------------------------------------chi2(1) = 26.67 Pr>chi2 = 0.0000 24 Now we relate some of these epidemiologic concepts (and definitions) to probability models. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 25 But suppose that we are interested in arriving at a measure of effect size that is similar to a nonparametric test of the difference in exposure (E) between two groups characterized by outcome (D)? If the exposures were continuous (e.g, dose of a drug in mg) or least ordered, we might want to do a non parametric test. Recall (Epi 520): Two-sample Mann-Whitney U statistic (equivalent to the Wilcoxon test): Let d ij = 1 if X 1 > X 0 = 0.5 if X 1 = X 0 = 0 if X 1 < X 0 Then for all possible mn pairs of X 1i , X 0 j , where i = 1,..., n; j 1,..., m 1 U= dij , is the fraction of all pairs in which X1 > X 0 and the number of tied pairs mn j i Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 26 This measure of effect is related to others we have seen (or are now seeing). In this case U = 0.667 (1) U = W 0.5m( n + 1) , where W = Rank Sum test statistic nm Is the effect size large in this case? RD = 0.33, RR = 2.0, OR = 4.0, p< 0.001, all of which might reflect large But U = 0.667, and 0.5 would reflect no difference. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 27 Another common measure of difference (used heavily in social sciences and psychology): Cohens d: ( x1 x0 ) sd , where sd is the pooled standard deviation Cohen (and many others) have used the rules of thumb: d size But we can find examples in which this rule makes little sense. 0.2 small 0.5 medium 0.8 large We will return to these issues. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 28 1.2 Probability Models how they relate to study design What population parameters can we estimate? The population has a parameter, a statistic, such as mean, risk, rate, RD, RR, OR. Efficient estimates for those parameters from samples of data. By efficient we mean that the variances (confidence intervals) are small. What test statistics to use to assess statistical significance (less emphasis in this course). Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 29 [Aside Notation: Persistent problem thoughout the course. Notation is BIG problem with all statistical methods and writings: For your reference. Here are common notations for a 2 by 2 table of cross classified data. E.g. x1 x1+ Both mean summed over second dimension. In this case, it means the sum over the columns to arrive at a row total. x11 x21 x+1 x12 x22 x+2 x1+ x2+ x++ So, you might see books with this type of notation and you should not be alarmed] Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 30 1. Prospective studies (one margin fixed) Example: Treatment (transfusion) is fixed by design, patients are followed until they develop disease or not. (See also Woodward example 1.2) Transfusion (exposure) Disease Tx+ TxD+ 5 ( x11 ) 13 ( x12 ) D49 35 Column 54 ( x+1 ) 48 ( x+2 ) Sum(fixed) In this example the + denotes summing over the rows, i.e., a column sum. Let PT = Pr [patient in treated group gets infected within 6 months.], and PC = Pr [control infected] Purpose of trial is to test equality of PT and PC. (H0: PT = PC or PT - PC = 0) PT and PC are conditional probabilities. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 31 PT = Pr[infection|treatment] The probability of infection given that the patient was treated. PC = Pr[infection|control] Usually, we assume X11 and X12 are independent, binomial B(n,p) random variables with parameters (X+1 , PT) and (X+2, PC). Note: X+1 = X11 + X21. Here: P T = X11 / X +1 , and P C = X12 / X +2 RD = Risk Difference = P T - P C . This is the estimate for the parameter of interest. P ) = P (1- P ) / X +2 C Var( C C Var( P T) = P T (1- P T) / X +1 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 32 Can estimate a confidence interval of RD: assuming that (as N get large RD ~ N(RD, Var(RD)) Since we no longer assume H0 is true, we use Var(RD) = Var(P C) + Var(P T ) , assuming independence. p (1- p C ) p T (1- p T ) Var RD = C + X +2 X +1 ( ) (Because, VAR(A - B) = VAR(A) + VAR(B) when independence) SE RD = Var RD ( ) ( ) 95% CI = RD 1.96 SE RD In the example, pT = .093, pC = .271, RD = 0.18 , Z = 2.36 ( p =0.02) 95% CI = (0.03 to 0.33) (Consistent with rejecting H0) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania ( ) C EP 521, Spring 2007, Vol I, Part 1 33 We can also arrive at a test statistic (to test Ho: RD=0) Under H0: pT = pC = p (common), i.e, RD=0 Where p can be estimated by p= Z= ( X11 + X12 ) and we test H with: 0 ( X +1 + X +2 ) ( pC - pT ) 1 1 p 1- p + X +1 X +2 ( ) This is pooled variance because we are assuming H0 true. Z ~ N (0,1) is compared to usual standard normal table or to computer function or calculator. This is the test statistic. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 34 Computing the 95% CI of the risk difference: (no longer assume Ho is true) Distinguish the case of variance under HA. [Aside: In Stata 9. One can fill in the blanks using the statistics option on the tool bar and finding cohort studies . Or one can write out the commands as follows: Recommend using the commands. Fewer errors. Faster to use.] . csi 5 13 49 45, exact or | Exposed Unexposed | Total -----------------+------------------------+---------Cases | 5 13 | 18 Noncases | 49 45 | 94 -----------------+------------------------+---------Total | 54 58 | 112 | | Risk | .0925926 .2241379 | .1607143 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------Risk difference | -.1315453 | -.263813 .0007223 Risk ratio | .4131054 | .1577792 1.081613 Prev. frac. ex. | .5868946 | -.0816134 .8422208 Prev. frac. pop | .282967 | Odds ratio | .3532182 | .1214716 1.034842 +----------------------------------------------1-sided Fisher's exact P = 0.0495 2-sided Fisher's exact P = 0.0734 (Cornfield) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 35 IMPORTANT (THIS CAUSES PROBLEMS FOR MANY STUDENTS) Examples: Odds and risk: Recall: Let risk = p (a probability) Odds = p/(1-p). And p = odds/(1+odds). Essential to know these relationships Risk of disease | exposed = 5/54. From Stata = 0.092. This is easy and does not confuse. What are the odds of disease | exposure? This causes confusion except at Santa Anita, Gulfstream, Aquaduct, Saratogo and the like. That is odds = [(5/54) / (49/54)] = 5/49 = 0.102. This is greater than the risk. Risk of disease | unexposed = 13/58 = 0.224 from Stata What are the odds of disease | No exposure? (13/45 = 0.29) Much greater than risk. We have the risk ratio from Stata = 0.41. That = 0.092/ 0.224 a ratio of risks (or probabilities) Odds ratio = ratio of odds = 0.102/0.29 = 0.35 (Also in Stata output) Odds ratio = ad/bc = 5*45/13*49= 0.35. = Cross product ratio. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 36 So, ad/bc is a common way to get OR quickly, but Be able to compute odds and OR the slow way. Beware of this question on exams = HINT Rare Disease: How Rare is Rare? When is OR a good approximation to RR? Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP Spring 521, 2007, Vol I, Part 1 37 Pr(D|E+) .002 .01 .02 .06 .08 .10 .12 .16 .18 .20 Pr(D| E-) .001 .005 .01 .03 .04 .05 .06 .08 .09 .10 RR 2 2 2 2 2 2 2 2 2 2 OR 2.002 2.01 2.02 2.06 2.09 2.11 2.14 2.19 2.22 2.25 RR and OR They become different in typical cases. When the baseline or reference group risk is low and RR is modest, RR = OR. But, when the baseline risk increases, and RR remains constant, OR tends to depart from the RR. So, the apparent size of the effect depends on the scale you are using. This simple concept confuses many investigators. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 38 Pr(D|E+) .004 .02 .04 .08 .12 Pr(D|E-) .001 .005 .01 .01 .03 RR 4 4 4 4 4 OR 4.01 4.06 4.13 4.26 4.41 What happens when we increase the RR from 2 to 4. Whether OR is good estimate of RR under the rare disease assumption is dependent on: (1) risk of (D|E-) = underlying disease risk (baseline risk), and (2) RR Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 39 We see that OR and RR are not always the same. A common misrepresentation of results occurs when authors report OR for the situation of (a) high baseline risk (or risk in the reference group (b) large relative risk Example; (one of too many) : Bolton P et al. Comparison of changes in rates of diagnosable major depression before and after intervention JAMA 2003 June 18;289(23):3117-3124. (Prospective study ) Cntl D+ 113 D- 65 178 Pr(D)=159/341 =0.47 Tx 46 159 Pr(D|control) = 113/178 = 0.63 Pr(D|tx) = 46/163 = 0.28 The authors reported OR = 4.4 (controls 117 But the RR = 2.25 So, the OR overstates the 163 341 compared to treated. relative risk two-fold. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 40 In summary: RR and OR (again) We can compute risk and relative risk and odds and relative odds easily Relative risk is simply the ratio of risks (diseased vs non diseased) Odds ratios is simply the ratio of odds (odds in exposed vs odds in unexposed) Odds = risk only when the risk is very low (close to 0) Odds ratio = risk ratio only when the risk ratio is low AND the risk in the non diseased is low (more about RR and OR later) Please avoid this pitfall. Remember this difference and the causes for the difference when you are trying to perform power and sample size calculations. The program gives you OR s and you are trying to detect a RR. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 41 2. Retrospective Studies (such as a case control study) Another margin is fixed: Start with disease status, look backward for exposure For example in a family study: We see the cancer and look back to find exposure. Daughter has Cancer E+ D+ D- Mother had cancer (EXPOSURE) E7 193 200(X1+) Fixed row 3 197 200(X2+) sums The probability model here is the same as for the prospective study, i.e., X11 and X21 are independent binomials with parameters (X1+, pCA) and (X2+, pCO). Here: pCA = Pr [exposed|case] pCO = Pr [exposed|control] This what we observe in fact Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 42 What we would like to get (but can=t) would be PE = p (case|exposed) = disease risk PU = p (case|unexposed) But we cannot estimate disease risk directly. Use the following reasoning: Define: RR = pE pU Assume Rare disease (very important assumption) Then: PE and PU are small, and therefore: 1 pe 1 and 1 pu 1 This assumption means that the odds approximates the risk. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 43 Then we use the disease OR to approximate the disease RR, we use the fact that the exposure odds ratio = disease odds ratio, and we obtain exposure odds ratio from the data: (1) Disease odds ratio: P E / (1- P E ) P E Pr ( D | E ) = = Obtain approx RR from disease OR (Woodward pp 122-126) P U / (1- P U ) P U Pr D | E ( ) (2) Obtain Disease OR (DOR )from Exposure OR (EOR) Mother Has Cancer This can be viewed as a table with the typical entries a, b, c and d. E+ ED+ 7 193 D- 3 197 Then, compute the EOR as: (Odds of exposure|disease) / (odds of exposure| no disease) (7/193) / (3/197) = 0.036/0.015 = 2.38 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 44 DOR = (odds of disease|exposure)/ (odds of disease|no exposure) = (7/3) / (193/197) = 2.33/0.98 = 2.38 Pr ( D | E + ) / 1- Pr ( D | E + ) can be re-expressed as: Disease OR = Pr ( D | E ) / 1 - Pr ( D | E ) = Pr( E + | D) / 1- Pr ( E + | D ) = exposure odds ratio, Pr ( E + | D ) / 1- Pr ( E + | D ) which we can estimate from retrospective data. So, DOR = EOR, odds approximates risk if the disease is rare, and therefore EOR approximates the relative risk. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 45 3. Cross sectional study --Table Total Fixed Can estimate all probabilities: Cross sectional study Interest is relationship between infection and tonsils N=1398 British school children had throats swabbed (and cultures) plus tonsils examined. Neither row nor column sums is fixed by design. Only total number of subjects is fixed. Strep YES Tonsils BIG SMALL 53 19 72 NO 829 497 1326 882 516 1398 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 46 Obtain marginal rates: Pr (Strep +) = 72/1398 = Pr(S+) Pr (Tonsils big) = 882/1398 = Pr(T+) Pr [S+ and T+] = 53/1398 H0: P(S+ and T+) = Pr(S+) * Pr(T+) (if independent) Note: P(AB)= P(A) * P(B) Obtain cell probabilities: Under H0: (in general), expected under independence for the a cell is: 72 882 1398 1398 1398 P(S+ ) P(T + ) N =45.42. X i+ X + j X ++ X ++ X ++ is a test for independence of rows and columns. E ij = X i+ X + j X ++ 2 Compute usual 1 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 47 (Woodward, pp 47-49). This equation involves the sum of the four cell Eij calculations for the four cells in the table, where j= rows (1 and 2) and j=cols (1 and 2). i j 12 = (Oij Eij ) 2 In this sampling framework (and only in this framework) 2 It can be shown that: 1 = Z 2 columns = Z 2 rows Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 48 4. Both margins fixed - (Table total as well) Seldom true, but may be logical to assume. e.g., small clinical trial N = 20 randomized to treatment or control Trt Cured Y N 7 3 10 X+1 Ctrl 2 8 10 X+2 9 11 20 X1+ X2+ H0: Assignment to treatment has no effect on outcome. Thus can regard X1+, number cured (under Ho) as fixed ,since the experiment will not affect it, i.e., marginal risk of cure is fixed as we know the overall cure rate Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 49 We know we fixed X+1 and X+2 (treatment assignments) and X++. So what is random? Ans: The number of patients cured who are randomized to treated group, i.e., X11. One cell dictates all the rest. We can compute the remaining cells from the margins. X11 has hypergeometric distribution. X +1 X +2 X11 X12 P ( X11 ) = X ++ X1+ X11 ranges from 0 to min (X1+, X+1) Why? Because margins are fixed. This distribution forms the basis for Fisher's exact test. (LATER) When X+1 and X+2 become large, p-values for binominal test and Fishers are close. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 50 We will skip the case when nothing is fixed. (Poisson distribution) Except for an example: Tick Type Deer Location Of trap marsh woods a c Wood b d Here the length of observation determined total counts and marginal counts. Point: 2 valid regardless of model. We can use the chisq test for Ho: no effect, no association, regardless of the study design. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 51 1.3 Cohort vs. Case-Control Sampling: We were previously looking at prospective, retrospective, and cross-sectional designs. Another View (flip side of same coin) Cohort: (RCT -- randomized controlled trial): Follow exposed (treated) and non-exposed (control) Pr [D|E] Sampling fractions f1 = fraction of exposed -- e.g., f0 = fraction of unexposed Observed Table (lower case) of cases sampled. Lower case is the number of sampled observations. E+ a c Eb d 1 = 0.1 10 "1 in 10" D+ D- m1 m0 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 52 Sampled from population (upper case letters represent population counts) E+ f1A=a f1C=c f1N1=n1 Ef0B=b f0D=d f0N0=n0 D+ D- m1 m0 Here, we are sampling exposed and unexposed.: f1 = fraction of exposed who are sampled, and f0 = fraction of unexposed who are sampled Pr (D|E)in sample: a A A = f1 = = risk among exposed, subject only to the random variability in the n 1 f 1( A+ C ) N1 sample. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 53 Pr ( D | E ) = f 0B = B = risk among unexposed f 0 ( B+ D ) N 0 A/ N1 ad ( f 1 A )( f 0 D ) AD ; OR = = = B/ N 0 bc ( f 0 B )( f 1 C ) BC So, OR is unbiased also. RR unbiased (clearly) = So in cohort sampling, we can estimate Risk, RR, OR Recall: OR RR depends on baseline risk. If C N1 D N0 A N1 N 0 AD B CB Then So, if disease is rare OR RR Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 54 Case Control Sampling: Recall that the easiest and clearest way to think about a CC study is to regard it as a sample from the entire cohort. Sample cases and controls with sampling fractions g1 and g0; (Before we used f1 for cohort example.) E+ ED+ Da c n0 b d n1 m1 m0 This is the observed sample. These observations come from population (upper case) Now, we are sampling cases and controls, not exposed and unexposed. E+ D+ Dg1A=a goC=c n1 Eg1B=b g0D=d n0 m1 = g1M1 m0 = g0M0 Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 55 So cases and controls are sampled at different frequency (and assume sampling does not depend on exposure). Naive Risk Estimate: (E) group a n1 = g1 A A g1 A+ g 0 C N1 This estimate is biased (in the sense that the true value differs from the mean of the estimate over repeated sampling. Systematic rather than random departure of the estimated value from the true value) If g1 = g0, then a n1 = A A = A+ C N1 In practice g1 g0 Both unknown. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 56 What about estimating OR s from a case control study? E+ D+ a D- c Eb d m1 m0 We can compute unbiased estimate of risk of exposure among cases and controls. P(E | D) = a m1 = g1 A A = g1 A+ g1 B M1 P(E | D) and exposure odds: and . . . 1- P ( E | D ) ad ( g1 A ) ( g 0 D ) AD OR = = = bc ( g1 B ) ( g 0 C ) BC So, OR is unbiased estimate for true OR. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 57 Again, for rare disease OR RR Example: Hill (Proc. R.S. Med, 1965) Death risk in smoking and non-smoking doctors: Risks were: 2270/million among smokers 70/million among non smokers 2270 RR = = 32.4 . True OR = 32.5. Close, because of low event rates 70 Hypothetical full sample: Sampling factors for case-control study E+ D+ D2270 997,730 1 million E70 999,930 1 million Sampling fraction 1/1000 (= g1) (= g0) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 58 Death risk is low, so 3% (3000 / 100000) = risk, odds The sample then is: E+ ED+ DOR = 1135 998 35 1000 1135 x1000 = 32.5 35 x 998 (Also, see Rothman pp. 95-96, special case in which EOR = incidence rate ratio, without regard to rare disease in case control studies.) Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 59 Example: Using 2 by 2 tables for clinical decision making problem Assume you have a diagnostic test for which you want to estimate the likelihood ratio. You obtain a group of diseased patients and a group of non-diseased. Then you apply a test and obtain T+ or T- (What does this resemble?) Which margin is fixed by design? (The disease status). But we are going to depart from the case/control or cohort or cross sectional design concepts. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 60 Here are the data (they are binary 0/1 with one row of data per subject) (always remember that the noun data takes a plural verb data are) . tab T D, col | D T | 0 1 | Total -----------+----------------------+---------0 | 45 8 | 53 | 81.82 18.18 | 53.54 -----------+----------------------+---------1 | 10 36 | 46 | 18.18 81.82 | 46.46 -----------+----------------------+---------Total | 55 44 | 99 | 100.00 100.00 | 100.00 You can compute easily the sensitivity (Sn = Pr(T+|D+) = 0.82), and the specificity (Sp = Pr(T-|D= 0.82). And you could easily obtain a confidence interval about those estimates. Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 . cii 44 36 -- Binomial Exact -Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------| 44 .8181818 .0581456 .6728627 .9180797 61 We can do this simple calculation of the binomial confidence intervals because the diseased people are different from the nondiseased, and the two samples are independent. The likelihood ratio positive is LR+ = Sn/(1-Sp). LR+ is important because it allows us to compute posterior odds from prior odds. We know from Bayes Theorem (as applied in clinical decisionmaking): Posterior odds = LR * prior odds Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 62 Here LR + = 0.82/(1-.82) = 4.5 Prior probability of disease = 5% (which is a prior odds of 0.05/0.95) =0.053). Then after a positive test, the posterior odds = 0.053*4.5 = 0.24. The posterior probability is then .24/(1+0.24)) = 0.19. What is the confidence interval for 4.5? Set this us as a problem in which we want to compute the relative risk. RR= Sn/(1-Sp) = Pr(T+|D+)/Pr(T+|D-). But this is a RR of obtaining a positive test among diseased and non diseased. (It is backwards from what we might usually think of RR, but the concept is the same). Center for Clinical Epidemiology and Biostatistics School of Medicine, University of Pennsylvania Copyright 2006 Trustees of the University of Pennsylvania C EP 521, Spring 2007, Vol I, Part 1 63 Using Stata: we just have to reverse the order of the command to tell Stata to think about this problem as a 2 by 2 table to compute RR from a study in which the test is the outcome and disease is the exposure. . cs T D | D | | Exposed Unexposed | Total -----------------+------------------------+---------Cases | 36 10 | 46 Noncases | 8 45 | 53 -----------------+--------------...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - EPI - 521
EP 521, Spring 2007, Vol II, Part 315Applied Logistic Regression ModelingIn this section, we shall use our knowledge of (a) the methods for writing regression models and solving for estimates, and (b) the theory of logistic regression, to wor
UPenn - EPI - 521
EP 521, Spring 2007. Vol II, Part 11Statistical Methods in Epidemiologic Research EP 521 Spring 2007 Course Notes Vol II (Part 1 of 9) Multivariable RegressionA. Russell Localio*, and Jesse A Berlin (The Great Master)*Department of Biostatisti
UPenn - EPI - 521
EP 521 Spring 2007, Vol II, Part 9 (Under development) 10 Propensity Scores (balancing scores) 10.1 Potential outcomes, confounding, and conditional independence Problem: In randomized studies: when there are two groups, treated and control: We rely
UPenn - EPI - 521
EP 521, Spring 2007, Vol II, Part 719 Survival Analysis 9.1 Survival and hazard functions 9.2 Survival data and censoring 9.3 Estimating survival functions 9.3.1 Life Table method 9.3.2 Kaplan-Meier method 9.4 Competing risks 9.5 Noninformative c
UPenn - EPI - 521
EP 521, Spring 2007 Vol II, Part 517Other generalized linear regression models for epidemiology We have focused on logistic regression and linear regression. But these are special case of generalized linear models: Ordinary least squares reg
UPenn - EPI - 521
EP 521 Spring 2007 Vol I, part 312.2 Stratified AnalysesMethods and formulae How do we analyze data in the presence of confounding (or effect modification)? This section focuses on Mantel Haenszel methods for stratified analysis of binary outcome
UPenn - EPI - 521
EP 521, Spring 2007, Vol II, Part 819.9 Cox (proportional hazards) multivariable survival methods We have examined the basic and principal methods of handling survival data: life tables, KaplanMeier estimates, and the log-rank test. These methods
UPenn - EPI - 521
EP 521 Spring 2007, Vol II, Part 21Regression Methods for binary outcomes (logistic regression) 4.1 Background 4.2 Logistic regression properties of the model 4.3 Logistic regression Use of the model 4.4 Likelihoods and likelihood ratios 4.5 Li
UPenn - EPI - 521
EP 521 Spring, 2004, Vol I, Part 513.Sample Size Estimation A key to study design are sample size or power calculations. Required of ever grant proposal In this section: (1) we begin with theory behind power calculations and demonstrate how sim
UPenn - BSTA - 790
Noncompliance in randomized trials Frequently in randomized trials, subjects do not comply with their assigned treatment regimen Examples: Health Insurance Plan (HIP) trial of screening for breast cancer (BC) 2 arms: control: no screening screening
UPenn - BSTA - 652
Reference: Agresti, Chapter 16. Categorical data are measured using a limited number of valuesor categories. Categorical variables may have a natural ordering (ordinal) orthe order may be irrelevant (nominal). They are common in biomedical sc
UPenn - MATH - 103
1. 2.Give an example of a pair of (different) functions that have the same derivative. Find an anti-derivative of each of the following functions: a) f (x) = sin(2x) b) f (x) = x3 x2 c) f (x) = x3.The points A,B,C,D (in some order) are success
UPenn - MATH - 103
UNIVERSITY of PENNSYLVANIA MATHEMATICS DEPARTMENTMathematics 103 Midterm I Fall 2006Your Name:_ Penn ID#_ Your Professor (check one): Crotty Komendarczyk Tapp Your TA: __Instructions: You have 2 hours to complete this examination. Please write
UPenn - MATH - 103
UNIVERSITY of PENNSYLVANIA MATHEMATICS DEPARTMENTMathematics 103 Midterm II Fall 2006Your Name:_ Penn ID#_ Your Professor (check one): Crotty Komendarczyk Tapp Your TA: __Instructions: You have 2 hours to complete this examination. Please write
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - C - 90
U S I N G T H E S A M E S Y S T E M F O R A N A L Y Z I N G AND S Y N T H E S I Z I N G S E N T E N C E SPhillipeRincel*andPaul Sabatier* Bull S.A., CE1)IAG, 68 Route de Versailles, 78430 Louveciennes, France. * CNRS, Groupe Intelligen
UPenn - C - 73
BENTE MAEGAARD-EBBE S P A N G - H A N S S E NSEGMENTATION OF FRENCH SENTENCES1. This paper describes a programme which, by means of a very limited number of criteria, analyses French sentences into principal clauses and subordinate clauses. W
UPenn - J - 99
Computational LinguisticsVolume 25, Number 3Beyond Grammar: An Experience-based Theory of Language Rens Bod(University of Amsterdam) Stanford: CSLI Publications (Lecture notes number 88), 1998, xiii+168 pp; distributed by Cambridge University Pr
UPenn - P - 93
INTEGRATING WITHWORD BOUNDARY IDENTIFICATION SENTENCE UNDERSTANDINGKok Wee GanDepartment of Information Systems eJ Computer Science National University of SingaporeK e n t R i d g e C r e s c e n t , S i n g a p o r e 0511 Internet: gankw@iscs.
UPenn - J - 93
Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and RetrievalPaul S. Jacobs (editor)(Research and Development Center, General Electric Company) Hillsdale, NJ: Lawrence Erlbaum Associates, 1992, viii + 281 pp.
UPenn - C - 90
RECOGNIZINGADVICE, WARNINGS,PROMISESAND THREATSKevin Donaghy School of Computer Science and Information Technology Rochester Institute of Technology, Rochester, New York 14623 hkd@cs.rit.eduIt is argued here that utterances in the imperative m
UPenn - P - 83
Crossed S e r i a l Dependencies: i low-power parseable extension to GPSG Henry Thompson Department of Artificial Intelligence and Program in Cognitive Science U n i v e r s i t y of Edinburgh Hope Park Square, Meadow Lane Edinburgh EH8 9NW SCOTLAND
UPenn - P - 96
Using textual clues to improve metaphor processingSt6phane FerrariLIMSI-CNRS P O B o x 133 F-91403 Orsay cSdex, FRANCE ferrari@limsi.frAbstract In this paper, we propose a textual clue approach to help metaphor detection, in order to improve the
UPenn - C - 88
Vi~tcenza I'~I~}NATARO PaL f. Lingaisfik u.Literattn,viss. Universitht Bielefeld PostfN~h 8640 D-4g0b~ Bielefeld 1 .4~&lt;.x;~&gt;~: The aim of the presenteA rc~:~ffeh is the dt~velop ~:~-~i: ~f a lh~gaisdc mo.del of the function01 cont~pts topic and ~i;,
UPenn - P - 84
SOME L I N G U I S T I CASPECTSFOR A U T O M A T I CTEXTUNDERSTANDINGInstituteYutaka Kusanagi of Literature and Linguistics University of Tsukuba Sakura-mura, Ibarakl 3 0 5 JAPANABSTRACTThis paper proposes a s y s t e m of mapping clas
UPenn - T - 78
The Relation of Grammar to Cognition-a Synopsis Leonard Talmy Program in Cognitive Science / Center for Human Information Processing / UC San DiegoAbstract A sentence (or other portion of discourse) is taken to evoke in the listener a meaning compl
UPenn - P - 84
REPRESENTINGKNOWLEDGE ABOUT KNOWLEDGE AND MUTUAL KNOWLEDGE Sald SoulhiEquipe de Comprehension LSI-du Raisonnement UPSNaturelllg route de Narbonne 31062 Toulouse - FRANCEABSTRACT In order to represent speech acts, in a multi-agent context
UPenn - P - 84
TRANSFER IN A MULTILINGUALMT SYSTEMSteven Krauwer &amp; Louis des Tombe Institute for General Linguistics Utrecht State University Trans 14, 3512 JK Utrecht, The NetherlandsABSTRACT In the context of transferbased MT systems, the nature of the inte
UPenn - E - 87
STRING-TREE CORRESPONDENCE GRAMMAR: A DECLARATIVE GRAMMAR FORMALISM FOR DEFINING THE CORRESPONDENCE BETWEEN STRINGS OF TERMS AND TREE STRUCTURES YUSOFF ZAHARIN Groupe d'Etudes pour la Traduction Automatique B.P. n 68 Universit~ de Grenoble 38402 SAI
UPenn - C - 88
E x p r e s s i n g q u a n t i f i e r s c o p e in F r e n c h g e n e r a t i o nPierre-Joseph G A I L L Y * Computer Science D e p a r t m e n t , U n i v e r s i t y of Liege, B4000 Li~ge~ B e l g i u mAbstractIn this paper we propose a new
UPenn - C - 00
Automatic Refinement of a POS Tagger Using a Reliable Parser and Plain Text CorporaHideki Hirakawa, Kenji Ono, Yulniko Yoshimura Human Interface Laboratory Corporate Research &amp; Development Center Toshiba Corporation Konmkai-Toshiba-cho 1, Saiwai-ku,
UPenn - MATH - 114
UPenn - EXAM - 1
Information about the First Hour exam for Math 114-001 Mechanics of the exam:(1) The exam will begin on Monday, Feb. 4, in our usual room, DRL A8, at the usual time (11:00 am). It will be 45 minutes long to give us time to collect the exams before e
UPenn - EXAM - 2
Math 114-001: Hour Exam 2 KeyFeb. 22, 2008Multiple choice questions (10 points each). (1) The curves in R3 dened by r(t) = (t2 , sin(t), t3 ) and h(t) = (sin(t), t, t2 ) intersect at the point (0, 0, 0) when t = 0. What is the angle in radians bet
UPenn - EXAM - 2
Information about the Second Hour exam for Math 114-001 Mechanics of the exam:(1) The exam has been moved to Friday, Feb. 22, in our usual room, DRL A8, at the usual time (11:00 am). It will be 45 minutes long to give us time to collect the exams be
UPenn - EXAM - 3
UPenn - EXAM - 4
Information about the Fourth Hour exam for Math 114-001 Mechanics of the exam:(1) There will be a review session from 8 p.m. to 10 p.m. the evening of Thursday, April 24, in room A5 of DRL labs. (2) The exam will be on Monday, April 28, in our usual
UPenn - EXAM - 4
Math 114-001: Hour Exam 411 a.m., April 24, 2008Your Name: Your T.A. and recitation time: Instructions: This exam is 45 minutes long. You can use one handwritten one-sided page of notes, but no books or calculators. It is important that you show
UPenn - EXAM - 3
Information about the Third Hour exam for Math 114-001 Mechanics of the exam:(1) There will be a review session from 8 p.m. to 10 p.m. the evening of Thursday, May 27, in room A5 of DRL labs. (2) The exam will be on Monday, March 31, in our usual ro
UPenn - STRINGPHEN - 2008
General Analysis of LARGE Volume Scenarios with String Loop Moduli StabilisationMichele CicoliDepartment of Applied Mathematics and Theoretical Physics University of Cambridge SP08, Upenn, 29 May 2008 Based on: 1) M. Cicoli, J. Conlon, F. Quevedo a
UPenn - MATH - 170
Solutions to Problem Set #2 (logic and voting systems)Q 1. Negate the sentence: For every vote Senator Specter gets, he has to spend $10 or one hour of volunteer time. Answer We can let x be a vote that Specter gets, and P (x) = Specter spends $10
UPenn - MATH - 170
Eve Mayer March 19, 2003Mathematics 170 Project: Standoff at Fort SumterThe American Civil War, which lasted from 1861 to 1865, began with a standoff at Fort Sumter, South Carolina. The fort, located in the Charleston harbor, was a federal base c
UPenn - MATH - 170
The Israeli Palestinian ConflictPost Oslo AccordsMeira Levinson Math 170 3/ 19/ 03Historical Background After many years of conflict and war between Israelis and Arabs, hope for peace finally seemed possible with the signing of the Oslo Accor
UPenn - MATH - 170
Aaron Searson Math 1703/23/03 Dr. PrestonRussia Chechnya Conflict Russia and Chechnya have been in conflict since 1994. Chechnya is demanding territorial sovereignty by Russia, and Russia is refusing. The two parties fought a war from 1994-1996,
UPenn - MATH - 170
The French and American Quasi-War of 1797-1800Brian Savage March 20, 2003The French see Jays Treaty between the United States and Britain as a signal that the US supports Britain in the ongoing war against France. Upon John Adams victory in the 17
UPenn - MATH - 170
Scott Reich Ari Goldman Abstract of ProjectMath 170 Dr. PrestonThe American Civil War was one of the largest conflicts in our nations history. Never before had the people of the United States been so divided over a set of issues with such profoun
UPenn - MATH - 170
Andrea Herrero, Julie Rifkin, and Stephanie BuswellU.S. Civil War: A Discussion of the Decision TreeOverview: When Abraham Lincoln was elected in 1861, the South knew that he was a strong anti-slavery advocate (an abolitionist). As the South viewe
UPenn - MATH - 170
The beginning of this escalation came during the Napoleonic Wars, when France instituted the Continental system and Britain passed the Order in Council, both of which seriously threatened the United States commerce. Jealous because the new kid on the
UPenn - MATH - 170
The Cuban Missile CrisisSituation: In 1962, three years after Fidel Castro seized power in Cuba and installed a Marxist regime that was favored by the Soviet Union, Primer Khrushchev and other members in the Kremlin decided to move forward with the
UPenn - MATH - 170
Ryan Pisarri Jon Kluft Math 170CONTEXT: The U.S Contra vs. Nicaraguan Sandinista conflict of the 1980s has to be examined in the context of widespread Cold War paranoia and obviously the concerns the U.S. has for the evil of Communism. Nicaraguas f
UPenn - MATH - 170
Cyprus Conflict of 1967By Linda Chang &amp; Mark ConcepcionMap of CyprusBackgroundPopulation 2 main ethnicities: Greek, TurkishProblems Groups are divided by culture, religion &amp; language Greeks are majority, Turks are minorityLeads to strugg
UPenn - MATH - 170
The Escalation of the Persian Gulf WarRobin Watson Monica SilvestreBackground-Kuwait receives independence from Britain in 1961 leaving the country vulnerable and without military force to protect itself. -July 1990 Iraq, controlled by Sadam Hus
UPenn - MATH - 170
Lauren Pratto 3/20/03 MATH 170 Professor Preston U.S./Colombia Conflict Drugs, Guerillas, and Human Rights After problems in Colombia reached a peak in 1989 with regards to both illegal drug cultivation and drug trafficking as well as leftwing gueri
UPenn - MATH - 170
I am exploring the possible actions that might take place during the United States second conflict with Iraq. Could there be no bloodshed? Would Iraq ever kick Saddam Hussein out? Come explore the possibilities with me in the wonderful world of confl
UPenn - MATH - 170
Sherri Cohen Rachel Moskowitz Dena Weisberg Professor Stephen Preston Math 170 3.19.03The Arab-Israeli War of 1948Please refer to decision trees while reading the explanations. Actual course of the war In response to the increasing desire of the i
UPenn - MATH - 170
Meredith Gamer Math 170 Ideas in Math Escalation Project 03.20.03 The Cuban Missile Crisis occurred in October of 1962 when the American government found out that the USSR was secretly building missile bases in Cuba. America would have to respond to
UPenn - MATH - 170
Palestine-Israel ConflictElizabeth Ivester Jennifer Linden Jennifer PriceIntroduction For hundreds of years, Israelites and Palestinians have warred against each other in order to gain control of Jerusalem and the areas surrounding the city (it i