37 Pages

v2part1

Course: EPI 521, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 8274

Document Preview

521, EP Spring 2007. Vol II, Part 1 1 Statistical Methods in Epidemiologic Research EP 521 Spring 2007 Course Notes Vol II (Part 1 of 9) Multivariable Regression A. Russell Localio*, and Jesse A Berlin (The Great Master) *Department of Biostatistics and Epidemiology Center for Clinical Epidemiology and Biostatistics School of Medicine University of Pennsylvania Philadelphia PA 19104-6021 Statistical Science,...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> EPI 521

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
521, EP Spring 2007. Vol II, Part 1 1 Statistical Methods in Epidemiologic Research EP 521 Spring 2007 Course Notes Vol II (Part 1 of 9) Multivariable Regression A. Russell Localio*, and Jesse A Berlin (The Great Master) *Department of Biostatistics and Epidemiology Center for Clinical Epidemiology and Biostatistics School of Medicine University of Pennsylvania Philadelphia PA 19104-6021 Statistical Science, Biometrics and Clinical Informatics (BCI). J&J Pharmaceutical Research and Development, LLC 1125 Trenton-Harbourton Road PO Box 200 Titusville, NJ 08560 FOR CLASS USE ONLY DO NOT CITE OR REPRODUCE Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 2 4 Multivariable regression 4.0.0 Overview of regression The mean value of a dependent variable (y) is expressed as a linear combination of set of explanatory variables (xs) A) What is a regression model? Briefly, we express the outcome (y) as a function (a linear combination) of factors (Xs) to form a model. Each factor contributes information about the variation in outcomes across individuals. We use a variety of statistical methods (ordinary least squares, iteratively reweighted least squares, maximum likelihood) to arrive at estimates of the effect of x on y, where the estimate is such that the statistical model provides the closest fit to the data so that we can summarize the data by a model that comes close to explaining the relationship between outcome and exposure. The estimated effect can be: -odds ratio (Vol II, Parts 2,3), -relative risk (Vol II, Part 5), -incidence rate ratio (Vol II, Part 5), -hazard ratio (Vol II, Part 8), or -change in y per unit change in x (Vol II, Part 1), depending on the form of the statistical model. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 3 B) Why use regression model? 1) Control of confounding Could use a stratified analysis, but if many potential confounders, and each with several levels, result is many strata and very sparse strata 2) Interaction Can estimate interactions (effect modification, i.e., estimate association of exposure and outcome at each level of covariate) while controlling for confounding of other covariates (other than the effect modifier) e.g., estimate association of statins and AMI by gender, controlling for age 3) Arrive as smoothed estimates Model can borrow data from one category in estimating another, e.g., model age as linear factor estimates for each age category are smoothed to fit trend. When only main effects in model (no interactions), get similar smoothing. Can result in more robust, reproducible, estimates. 4) Arrive at a model of the data that is (a) consistent with theory [biology, medicine, behavior, health systems] (b) interpretable to the intended audience (the journal readers) 5) We anticipate that residuals (difference between expected values from the model and the actual observed values) will show less variability that the raw observations. Why? Because the model explains the bulk of the variability in the data. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 4 C) Costs of regression 1) Every statistical model has its assumptions about the ys (binary, continuous), the model specification (xs), the form of the xs (linear, categorical) 2) Model might be less robust if more assumptions (for example, parametric analysis assume more than non parametric) 3) Skill in model fitting to avoid overfitting (the chosen regression model fits the data in the sample, but will not fit other samples drawn from the same population. 4) Zero cells no events in one level of categorical x can lead to undefined estimate (observations are dropped) 5) Choice of xs can be an art (requires extensive clinical experience as well as feel for data) 6) Tempting to let the computer do the analysis (epidemiologists computer programmers, and they should avoid temptation to change occupations) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 5 D) Limitations of regression 1) Does not avoid sampling bias, selection bias. Recall, we use randomization largely to guard against selection bias in both observed (e.g., age) and unobserved (e.g., physician selects patients for treatment based on unstated preferences) factors. 2) Measurement errors in xs still an issue, and when covariates are measured with error, direction of bias is unpredictable regardless of whether the exposure is measured with or without error The common understanding that nondifferential measurement error biases results towards the null does not apply to multivariable regression. Especially problematic in nutrition exposures, where food frequency questionnaires do not measure energy intake precisely. 3) Does not adjust for unobserved covariates Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 6 E) Uses of regression models 1) Show difference in two groups (e.g. Tx vs Control in RCT), adjusted for observed differences in baseline covariates (other than Tx). Randomization does not always adjust for observed factors. Simple regressions (no covariates) reduce to ANOVA for continuous outcomes and a binary exposure (the equivalent of a t-test) or to 2 by 2 tables for binary outcomes and binary exposures. 2) Explanatory model Association model To estimate association between exposure and disease (e.g., smoking and lung cancer) adjusting for potential confounders 3) Prognostic model Prediction of outcome when certain patient characteristics are present. (APACHE) A prediction model poses different issues but similar problems in the selection of covariates. There are somewhat different criteria for assessing the incremental value of new markers, tests, or other factors for prediction or classification. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 7 F) Common regression models: Outcome (Y) Continuous Interval Ordered categorical Example HbA1c (lab values) 1,2,3,4 where 4-3=2-1 E5 scales (poor, , excellent) # PCP visits/year #MIs/1000 persons (0/1) death graft rejection Regression linear (OLS) linear (OLS) ordinal logit continuation ratio logit Poisson Poisson Logistic Cox regression Parametric survival Estimate y / x y / x OR # events IRR RR OR HR binary time to event OR=odds ratio; RR=risk ratio; IRR=incidence rate ratio; HR=hazard ratio Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 8 G) Fitting regression models Algorithms to fit these models are based on maximizing the likelihood of the data We ask the computer what estimates are most consistent with the data For linear regression, the algorithm is called ordinary least squares (OLS). What estimates minimize the squared distance between observed data and fitted values from the model H. Uncertainty: (1) The regression will report the sampling uncertainty (standard error) (2) But there is another component of uncertainty modeling uncertainty from the variability associated in estimates from uncertainty in the choice of the regression model Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 9 4.0.1-Simple linear regression -- comparison of two groups Special case of generalized linear models (more later) y = continuous outcome (e.g. mm Hg, FEV1, fibrinogen (g/l) ) x = single predictor (treatment or exposure, binary (0/1)) Have equation in the form: Y = XB + , where XB represents a linear combination (sum) of two (or more) terms For one term (x), it is 0 + 1 x . This should not be new. If we have only a single group, this model reduces to mean or intercept only model: E ( y) = 0 , where 0 is simply the sample mean. We can fit this very simple model in Stata using the command reg yvar, where reg is short for regress and yvar is the variable name you assign to the outcome. So, you can obtain the mean height (and confidence interval) of a group of patients by regression: reg height Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 10 If we have two groups, we can then ask whether means of the two groups differ. Example: Goal is to estimate the difference in fibrinogen in two groups of patients (x=1 means positive for H. pylori; x=0 means negative for H pylori) Then for H. pylori present: E[ y1 ] = 0 + 1 For H. pylori absent: E[ y0 ] = 0 , where E denotes expected value Then difference in mean levels of fibrinogen; E[ y1 ] E[ y0 ] = 0 + 1 0 = 1 Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 11 For the fibrinogen data: ( Stata command= reg fibrin Hpylori), where reg is short for regress. beta se(b) t 2.406 p 0.016 LCL UCL 0.031 0.306 (This is 1 ) 2.64 2.88 (This is 0 ) Hpylori | 0.168 .070 _cons | 2.76 .059 46.869 0.000 We interpret (and can report) these results as: 2.76 is the mean (expected value) of fibrinogen in the H. pylori =0 group 2.76 + 0.168 = 2.93 = mean fibrinogen in the H. pylori =1 group 0.168= difference in the means between the two groups y / x where x = 1 0.031 to 0.306 is the confidence interval of this difference The result is statistically significant at conventional levels (p=0.016) We refer to 0 as the constant term or the intercept and to 1 as the slope or the effect size (All of this should be review). Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 12 In this simple case, the regression can be viewed (in a graph) as two side-by-side data plots, one for each level of Hpylori 7 6 5 fibrinogen 4 3 2 1 0 0 Hpylori 1 We can view the intercept ( 0 ) as the mean value of the distribution on the left and the slope ( 1 ) as the difference in the mean values of the two distributions. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 13 We could also display these data as side by side box plots (a better way to look at the difference in the two distributions). fibrinogen 7 6 5 4 3 2 1 0 0 1 Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 14 So, we fit a statistical model to the data and solve for (a) expected value of the outcome for any given patient defined by the factors (b) the effect size in this case the difference in means ( y ) = 1 x Link with EP 520. The simple linear regression (one x in the model) gives the same results as the t-test, if x is binary. So, the Stata command ttest fibrin, by(Hpylori) should achieve the same result. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 15 4.0.2 Simple linear regression -- association of continuous predictor and outcome Let x = a continuous predictor, such as patient age. Then the equation y = 0 + 1 x , has a simple graphical interpretation. (Recall algebra 1: this is y=mx+b equation of a line. b= 0 = intercept, and m= 1 = slope) Plot y vs x and the values of E[y], called the fitted value (a line) Fibrinogen (g/l) 7 6 5 4 3 2 1 0 30 40 50 age 60 70 Fitted values Fitting by OLS minimizes squared distance from points to the line. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 16 Solution to the regression equation (using ordinary least squares) b se(b) t p LCL UCL 0.017 .002 7.921 <0.001 0.013 0.022 1.98 .118 16.840 <0.001 1.75 2.21 age cons The values of beta (b) have graphical interpretation: 1.98 is the intercept = fibrinogen value where age=0 0.017 is the slope = increase in fibrinogen per 1 year increase age Two interpretation issues: 1. Slope coefficient for age is 0.017 meaningful? Question of possible clinical importance: change in fibrinogen for 10-year change in age. y = 0 + 1 x , where we pick x=50 and x=60 to show this change. (We could pick one-year difference, x=51, 52, but that would result in 1-year difference, which is likely not of clinical importance) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 17 E[ y1 ] E[ y0 ] = 0 + 1 60 (0 + 150) = 110 We estimate b1 (for age) and multiply by 10. For every 10 year increase in age, fibrinogen increase by 0.17 (10*0.017) 2. Intercept is 1.98 at age=0 meaningful? Want intercept at a clinically meaningful age Accomplished by centering/rescaling Let c_age= age-25. When age=25, c_age=0. (In Stata: gen c_age = age 25 c_age | .017 .002 _cons | 2.42 .066 7.921 0.000 36.685 0.000 .013 2.29 .022 2.55 Note, slope remains unchanged; intercept 2.42= fibrinogen at age 25 Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 18 We could have obtained same results from initial equation by plugging age=25 into equation y=1.98+ 0.017*25 = 2.41 (rounding error). But there are sometimes advantages to centering data in addition to interpretation e.g., to help complex model converge. Think about centering data routinely, especially in survival analysis. This practice helps interpretation of data Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 19 4.0.3 Multivariable linear regression 2+ predictors (Note: Do not use term multivariate to refer to 2+ predictors. Save that term for models with more than one y). y = 0 + 1 x1 + 2 x2 ... + p x p + , where xs might be binary, interval, ordinal, categorical, or continuous, and y is normally distributed, or at least approximately, conditional on the xs. Interpretation of 1 : The change in y per 1 unit change in x1 , holding constant values of x2 ,..., x p , the other covariates. Partial regression coefficient. Interpretation of j : The change in y per 1 unit change in x j , holding constant values of x1 ,..., x j1 , x j+1 ,..., x p , the other covariates. This is method of adjusting for several potential confounders at once. Can estimate E[ y] = 0 + 1 x1 + 2 x2 ... + p x p , where now have a fitted pdimensional response plane in p+1 dimensional space. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 20 4.0.3.2 Assumptions underlying linear regression: yi = 0 + 1 xi + i , where i = 1,..., n indexes the n observations. The observations are independent (crucial assumption) Error terms i are therefore uncorrelated (as are the responses) Error terms have constant variance as E(y) changes, as do the responses conditional on x. So, y might not be normally distributed. It might be bi-modal (two humps). So, weights for men and women might be two distributions mixed. But when we control for gender, the distributions of weights are normal for each group. Variance does not vary with x. Variance of weights among men and among women is the same. (homoscedasticity) E[i ] = 0 , and so E[ y1 | x ] = 0 + 1 x , as we have seen Should test these assumptions Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 21 Additional key assumptions: (1) The relationship between y and x is linear. linearity (Seen this issue in EP 520) Separate set of tests for (2) That we have included in the regression model all potential confounders. (3) That we have considered the possibility of effect modification Example: In Stata reg fibrin Hpylori c_age Is age a confounder? The equation were are writing in Stata is: E ( fibrin) = 0 + 1 Hpylori + 2 age B se(b) t p CI Hpylori | .060 .068 0.876 0.381 -.074 .193 c_age | .017 .002 7.553 0.000 .013 .021 _cons | 2.385 .075 31.889 0.000 2.238 2.532 So, H pylori status is no longer statistically significant, when control for age. What happened to the beta for Hpylori? From 0.168 to 0.060. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 22 This regression equation (model) gives the association of Hpylori with outcome (fibrinogen) controlling for age as a confounder. Thus, it functions the way a stratified analysis did. But unlike a stratified analysis, we can still assess the association of the confounder, age (which is centered in this example to be c_age), with outcome. Likewise, the estimate for age (here=0.017), is adjusted for any confounding by Hpylori. Is Hpylori a confounder? Does the beta for age change when we control for Hpylori? Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 23 4.0.3.3 Checking model fit and assumptions. Plot residuals vs fitted values 3 2 resid 1 0 -1 -2 2.5 3 Fitted values 3.5 Appears that dispersion of residuals increase with fitted values. We are checking the assumption of constant variance of residuals across levels of the expected value of y. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 24 Plot residuals against age (x) 3.23694 resid -1.84084 25 age 74 Also shows increasing dispersion with age Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 25 4.0.3.4 Transformations of yvariance stabilizing transformations Why do we want to stabilize the variance? What are the assumptions of the OLS regression model? Common transformation is natural log. In Stata gen lny = ln(y) means that lny is the log of the y values. But -- linear regression using lny causes special problems of interpretation reg lnfibrin Hyplori c_cage leads to this output: Hpylori | .021 c_age | .006 _cons | .858 .023 0.910 0.363 .0008 7.797 0.000 .025 34.408 0.000 -.024 .065 .004 .007 .809 .907 Look especially at the estimate of the coefficient for age (0.006). (Before, without transforming y we had 0.017). What is the interpretation of 0.006? Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 26 A few principles of natural logs (ln) Difference on ln (y) scale corresponds to ratio on y scale ln(A) ln(B)= ln(A/B), so exp[(ln(A)-ln(B)]= A/B. On ln scale, 10-year change in age change in ln(y) of 0.06 (0.006*10= 0.06) exp(0.06)=1.06, and CI of this ratio is 1.04 to 1.07. Note, when A is small, then ln(1+A) A; exp(A) 1+A Interpret 0.06 as a 6 % increase in fibrinogen per 10-year increase in age But, might not be the best way to handle nonconstant variance. This transformation produces the log normal model. Log because of the transformation. Normal because the regression model assumes normally distributed (Gaussian) errors. This common method of handling linear regression can lead to problems, so we shall revisit this issue of transformations of y several sections later in the course. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 27 4.0.3.5 Checking model assumptions: Plots (residuals vs E[y] , residuals vs xs) Stata commands rvfplot rvpplot Search distribution of studentized residuals for outliers < 1.96 or > +1.96) These residuals are adjusted so that they have constant variance Which observations have large residuals? Why? Look at distribution of leverage values (how extreme are xs) Think of leverage as how much pressure an observation can exert on the slope of a line, just the way a child can change the slope of a seesaw by moving away from the fulcrum (center). Plot leverage vs squared residuals lvr2plot Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 28 .734266 lnresid -.901403 .857895 Fitted values 1.16581 On ln(y) scale fixes problem of var(y) not constant Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 29 .01408 Leverage .002772 1.6e-07 Normalized residual squared .030979 Leverage vs (normalized residuals) Show which observations have both large influence (extreme values of X) and are outliers (large residuals) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 30 4.0.3.5 Summary of linear regression diagnostics Stata tools. These are invoked as post estimation command after using the command reg or regress and before invoking any other command. In Stata, type help regdiag for all of the details. Plots Plot name (y vs x) Residuals vs fitted values Residuals vs predictor plot Added variable plot Purpose Assumption of constant variance (homoscedasticity) Violation in regression assumption of constant variance with xj Check relationship between y (outcome) and xj (factor) controlling for all other factors To test form of the xj in model Should look at plot for each factor in the model. Do we need an x 2 term in model? Leverage (how far away are the patients Xs from the average) Residuals squared (how far is the observed value of the patient from the fitted value Stata command rvfplot rvpplot xj avplot xj avplots Leverage vs squared residuals lvr2plot Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 31 Diagnostics Name dfbeta Purpose Stata command dfits Impact of each observation on the dfbeta estimated regression coefficient. Look for observations with |dfbeta|> 2 n . Product of large leverage and studentized predict varname, residuals. If p covariates, then dfits observations for which dfits> 2 ( p+1) n need investigation. Identify outliers in terms of large residuals. Look for those outside range (-1.96 to +1.96) Search for collinearity among factors. Look for covariates with decomposition proportions>0.3 when condition number >30. Predict newvar, rstudent colldiag (following fit command (not documented)) Studentized residuals Variance decompositions Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 32 Tests -- Also run as post estimation commands directly after the reg command Name Cook-Weisberg test for heteroscedasticity Same Omitted variable test Purpose Test constant variance assumption (as in plot residuals vs fitted value) (chisq) (as in plot residuals vs xj) (chisq) Omitted higher order covariates term missing (F-test) Stata Command hettest hettest xj ovtest There are many tests and plots and diagnostics for linear regression. They are well documented in the literature. But there is no substitute for (a) collecting clean data (b) understanding the biological relationships (c) proposing a statistical model that describes the biological relationships (d) Avoiding the temptation to dredge the data References: Cook RD, Weisberg S. Residuals and Influence in Regression. New York: Chapman & Hall; 1982. Belsley RA, Kuh E, Welsch RE. Regression Diagnostics. New York: Wiley; 1980. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 33 4.0.4 Representing factors (classes or categorical variables) in regression Binary indicators (0/1) - Need only single indicator to represent two groups Nominal variables (no inherent ordering) class factor in programming jargon Class (primary insurance) Medicaid Medicare Fee for service HMO/PPO Champus VA D1 0 1 0 0 0 D2 0 0 1 0 0 D3 0 0 0 1 0 D4 0 0 0 0 1 Reference group Then a model for annual cost of care will be: y = 0 + 1 D1 + 2 D2 + 3 D3 + 4 D4 + With these interpretations: 0 =average cost for Medicaid recipients 0 + 1 = average cost for Medicare patients 0 + 2 = average cost for FFS patients Interval variables Visual analog scales E5 scales (poor, fair, good, very good, excellent) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 34 Assume difference 3 2 = difference 4 3 Let x=1,2,3,4,5, and fit as linear term in the model (as age) Continuous variables age, FEV1, mmHg, -- take on any real value Are there floor or ceiling effects (bunching at top or bottom)? Is the distribution bimodal (two humps)? No substitute for looking at the data, both ys and xs. Stata Allows easy creation during modeling of indicator or dummy variable Use of the command xi before the regression command. of this throughout. We will see examples But, in some cases you might want to do this by hand to be sure you understand data. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 35 4.0.5 Interactions (effect modification) in regression (Woodward ex 9.14) Consider example of fibrin. Suppose interest is effect of age by Hpylori status Can depict this main effect simply as two parallel lines. fibrin Hpyl=1 3.278 fibrin Hpyl=0 2 0.000 age-25 49.000 age=25 50 75 This plots the situation of no interaction between age, Hpylori status Repeating the equation: E ( fibrin) = 0 + 1 Hpylori + 2 age Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 36 The modeling results can be interpreted as follows: At age = 50, what is the expected fibrin level for Hpylori present or absent? If age = 50, c_age=25. Then from the Stata results: If Hpy=0: E ( y0 ) = 2.385 + 0.017*25 = 2.81 If Hyp=1: E ( y1 ) = 2.385 + 0.017*25 + 0.060 =2.87 If age increases, these values for fibrin increase in parallel by Hpylori status. We can see this result better by solving for the effect of Hpylori E ( y1 ) = 0 + 1 + 2 age -- if Hylori is present, and E ( y0 ) = 0 + 2 age -- if Hpylori is absent. E ( y1 ) E ( y 0 ) = 0 + 1 + 2 age (0 + 2 age) = 1 The effect of Hpylori does not depend on age. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 37 This model means that the fitted regression line of expected values ( E ( y ) ), shifts up in the amount of 1 across all values of age. Stated alternatively: The effect of Hpylori on fibrinogen is therefore adjusted for age or controlled for age. Likewise, the effect of age is to increase fibrinogen, controlling for Hyplori E ( y | age = age1) = 0 + 1Hp + 2age1 E ( y | age = age2) = 0 + 1Hp + 2 age2 So, the effect of a change in age from age=age1 to age = age2 is the difference of these two equations. The Hp term cancels out and we are left with: 2 (age2 age1) , which does not depend on Hpylori status. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 38 Can fit a model with an interaction term and look at the same plots . xi:reg fibrin i.Hpylori*c_age i.Hpylori IHpylo_0-1 i.Hpylori*c_age IHXc_a_# (naturally coded; IHpylo_0 omitted) (coded as above) Number of obs F( 3, 506) Prob > F R-squared Adj R-squared Root MSE = = = = = = 510 21.22 0.0000 0.1117 0.1065 .68276 Source | SS df MS ---------+-----------------------------Model | 29.6727371 3 9.89091236 Residual | 235.878601 506 .466163244 ---------+-----------------------------Total | 265.551338 509 .521711863 fibrin | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------IHpylo_1 | -.0016908 .1366471 -0.012 0.990 -.2701563 .2667748 c_age | .0152258 .0041659 3.655 0.000 .0070412 .0234104 IHXc_a_1 | .0025614 .0049564 0.517 0.606 -.0071762 .0122991 _cons | 2.424924 .1074893 22.560 0.000 2.213744 2.636104 [Aside on how to use Stata: Making categorical and interaction terms easily: Note the use of the xi command and the use of the * to form interaction terms. Type help xi to get the details: i.varname i.varname1*i.varname2 i.varname1*varname3 varname, varname1, and varname2 denote categorical variables (numeric or string). varname3 denotes a continuous, numeric variable. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 39 What items from the output should we consider? F-test (and its associated p-value) R-squared Adjusted R-squared Coeff (betas) Std Err (especially of the interaction term) t ( and its associated p-value) 95% Confidence interval We need to understand each one of these terms and be able to interpret them. More about this later. We need to be able to write the equation for the interaction model and solve for the contrast of interest. E ( y ) = 0 + 1 Hp + 2 c _ age + 3 Hp c _ age E ( y | Hp = 0) = 0 + 2 c _ age E ( y | Hp = 1) = (0 + 1 ) + 2 c _ age + 3 c _ age = (0 + 1 ) + (2 + 3 ) c _ age So, with an interaction term we have for each level of Hp Different intercepts Different slopes Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 40 Plotting fitted values for fibrin levels vs centered age, at each level of Hpylori, with interaction term for Hpylori*c_age. y1int 3.2955 y2int 2.4233 0.000 age-25 49.000 The positive interaction term (0.0025 from the coeff for IHXc_a_1) means that y / age , which is the slope of the line is no longer constant for Hpylori. Also, the difference between the two lines represents the effect size for Hpylori. This effect size changes by age. (In the present example, this interaction term is not statistically significant at conventional levels (p=0.05), and we can test that statistical significance formally). Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 41 Another example Formal tests for interaction using regression models: (Woodward ex 9.15) Consider association of gender and smoking status on BMI . xi:reg BMI i.gender*i.smoking i.gender Igende_1-2 i.smoking Ismoki_1-3 i.gender*i.smoking IgXs_#-# Source | SS df 5 144 149 [Gender 1= male] [Smoking 1=current, 2=ex, 3=never] MS 48.7646702 10.9361817 12.2055941 t -2.553 1.479 1.468 0.591 1.640 42.981 Model 243.823351 | Residual| 1574.81017 Total | BMI Igende_2 Ismoki_2 Ismoki_3 IgXs_2_2 IgXs_2_3 _cons | | | | | | | 1818.63352 Coef. -2.295376 1.300906 1.372243 .8007606 2.134757 25.52871 Number of obs = F( 5, 144) = Prob > F = 0.0008 R-squared = 0.1341 Adj R-squared = 0.1040 Root MSE = 3.307 P>|t| 0.012 0.141 0.144 0.556 0.103 0.000 150 4.46 Std. Err. .8991409 .879433 .9346393 1.355228 1.301447 .5939529 [95% Conf. Interval] -4.072596 -.5181568 -.4373597 3.039171 -.4751421 3.219627 -1.877949 3.47947 -.4376503 4.707165 24.35472 26.7027 Now, we have multiple interaction terms, and interpretation becomes more complex. In more complex models, we must establish a reference category. Then this becomes the _cons or intercept term in the regression model. Stata uses the lowest level of the covariates as the reference or intercept or baseline (unless you tell it otherwise). Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 42 4.0.6 Testing for interaction -- testing nested regression models the F test Can test whether multiple terms in model =0 by F-test. Test two models (1) full model with interaction terms, vs (2) reduced model with no interaction terms Since there are 2 interaction terms, difference in model has df=2. F-test has numerator and denominator df (denominator is 144 as output shows). Numerator is the number of terms in model, or when comparing two models, the difference in number of terms in two models. . test _IgXs_2_2 _IgXs_2_3 [Asking whether both interaction terms =0] ( 1) IgXs_2_2 = 0.0 ( 2) IgXs_2_3 = 0.0 F( 2, 144) = 1.36 Prob > F = 0.2595 Shows that there is no interaction of smoking and gender in the effect on BMI. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 43 Can then ask whether smoking status makes a difference xi:reg BMI i.gender i.smoking i.gender Igende_1-2 i.smoking Ismoki_1-3 (naturally coded; Igende_1 omitted) (naturally coded; Ismoki_1 omitted) Number of obs F( 3, 146) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.016 0.015 0.000 0.000 = = = = = = 150 6.49 0.0004 0.1177 0.0996 3.3152 Source | SS df MS ---------+-----------------------------Model | 214.039948 3 71.3466495 Residual | 1604.59357 146 10.9903669 ---------+-----------------------------Total | 1818.63352 149 12.2055941 BMI Igende_2 Ismoki_2 Ismoki_3 _cons | | | | | Coef. -1.339982 1.654546 2.482885 25.11181 Std. Err. .5486066 .6706935 .6498185 .5070832 t -2.443 2.467 3.821 49.522 [95% Conf. Interval] -2.424219 -.255746 .3290236 2.980068 1.198619 3.767151 24.10964 26.11398 . test Ismoki_2 Ismoki_3 ( 1) ( 2) Ismoki_2 = 0.0 Ismoki_3 = 0.0 F( 2, 146) = Prob > F = [Asking whether both smoking indicators =0] 7.63 0.0007 Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 44 Can easily test whether categorical factor with more than 2 levels is statistically significant by use of F-test. Notes: (1) for logistic regression and Poisson regression, comparable test is likelihood ratio test (2) Can use F-test for factor with single degree of freedom, but need not because it is identical to t-test reported in output (3) Can use the alternative syntax testparm which is easier if have several variables test at once testparm Ismok* would test for all smoking levels (4) Stata v 9.0 precedes all of the variables created by xi with _ e.g _Ismok_2 (5) Can use a likelihood ratio test instead of F test, but not commonly done Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 45 4.0.7 Effect sizes and interaction Categorical factors --Smoking and gender effects on BMI Should be able to write out the equation and then solve for the effect size of interest Some notes: Gender: 1=M, 2=F; Smoking 1=current, 2=ex, 3=never 1=baseline or reference for both (unless you tell Stata otherwise) We used the xi syntax. So, Stata automatically takes these variables and makes dummy or indicator variables, such that the reference group is the lowest value (unless you tell Stata otherwise) and the terms are recoded as 0/1. This automatic coding saves much time, BUT it also can create confusion. So, in this example, gender is recoded by Stata (using the xi syntax) as 0=M, and 1=F. Assuming model with no interaction: E ( BMI ) = 0 + 1 gender + 2 smoking 2 + 3 smoking 3 In women, what is the effect on BMI of being a never smoker vs a current smoker? E ( BMIWN ) = 0 + 1 + 3 ; E ( BMIWC ) = 0 + 1 E ( BMIWN ) E ( BMIWC ) = 0 + 1 + 3 (0 + 1 ) = 3 Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 46 In men, the effect on BMI is: E ( BMI MN ) = 0 + 3 ; E ( BMI MC ) = 0 E ( BMI MN ) E (BMI MC ) = 0 + 3 (0 ) = 3 So, effect of smoking (never vs current) is the same for both men and women. In other words, the effects are parallel. It does not matter whether the subject is a man or a woman the impact of smoking on BMI remains the same. But is this situation biologically plausible? Might not the effect of smoking be different? Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 47 For the model with interaction (effect modification) of gender and smoking: E ( BMI ) = 0 + 1 gender + 2 smoking 2 + 3 smoking3 + 2 gender smoking 2 + 3 gender smoking 3 [Remember that Stata has recoded gender to be 0=M and 1=F. ] The effect of smoking, never vs current, in women is: E ( BMIWN ) = 0 + 1 + 3 + 3 E ( BMIWC ) = 0 + 1 E ( BMIWN ) E (BMIWC ) = 0 + 1 + 3 + 3 (0 + 1 ) = 3 + 3 The effect of smoking, never vs current, in men is: E ( BMI MN ) = 0 + 3 E ( BMI MC ) = 0 E ( BMI MN ) E ( BMI MC ) = 0 + 3 ( 0 ) = 3 The effects are different among men and women Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 48 How can we show this with Stata? We can use display in Stata as a calculator to estimate E(BMI) among women never smokers: . display _b[_cons]+_b[Igende_2]+_b[Ismoki_3]+_b[IgXs_2_3] 26.740333 This is 0 + 1 + 3 + 3 And the same for women current smokers: . display _b[_cons]+_b[Igende_2] 23.233333 This is 0 + 1 The difference is 3.507. But we do not have a confidence interval for these estimates or for the differences. [di is an acceptable abbreviation in Stata for display] Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 49 We can use the lincom command to make the contrast of interest directly, and obtain confidence intervals. Here is the simpler syntax, and a confidence interval. . lincom _cons+Igende_2+Ismoki_3+IgXs_2_3 ( 1) Igende_2 + Ismoki_3 + IgXs_2_3 + _cons = 0.0 -----------------------------------------------------------------------------BMI | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------(1) | 26.74033 .603771 44.289 0.000 25.54693 27.93373 ------------------------------------------------------------------------------ We can also obtain the contrast, in this case the difference, but to do so, must have written out the contrast in terms of the coefficients: . lincom Ismoki_3+IgXs_2_3 ( 1) Ismoki_3 + IgXs_2_3 = 0.0 BMI | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------(1) | 3.507 .9056565 3.872 0.000 1.716902 5.297098 This is 3 + 3 Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 50 The same applies to men: Using lincom directly, or just reading the coefficient for Ismoki_3 from output: . lincom _b[Ismoki_3] ( 1) Ismoki_3 = 0.0 This is 3 -----------------------------------------------------------------------------BMI | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------(1) | 1.372243 .9346393 1.468 0.144 -.4751421 3.219627 The difference is greater in women than in men Therefore: (1) We should be able to write out the equation, the effect of interest (e.g. smoking among women), and the difference, and express everything in terms of coefficients We should be able to then solve for each using the estimated coefficients from Stata Finally, the lincom command allows us to make the contrast (the difference) easily and to obtain a confidence interval of the contrast. (2) (3) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 51 Effect size and Interactions Continuous factor Age and Hpylori Effect on fibrinogen Going back to our example: . xi:reg fibrin i.Hpylori*c_age fibrin IHpylo_1 c_age IHXc_a_1 _cons | | | | | Coef. -.0016908 .0152258 .0025614 2.424924 Std. Err. .1366471 .0041659 .0049564 .1074893 t -0.012 3.655 0.517 22.560 P>|t| 0.990 0.000 0.606 0.000 [95% Conf. Interval] -.2701563 .2667748 .0070412 .0234104 -.0071762 .0122991 2.213744 2.636104 The overall model would be: E ( y ) = 0 + 1 Hp + 2c _ age + * Hp * c _ cage For Hpylori=0 and age=40 (c_age=15): E ( y0 ) = 0 + 1 0 + 2 15 + 0 15 = 0 + 152 For Hpylori=1 and age=40 (c_age=15): E ( y1 ) = 0 + 1 + 2 *15 + *1*15 = 0 + 15 2 + 1 + 15 E ( y1 ) E ( y0 ) = 1 + 15 This is the effect of Hyplori, controlling for age, but at an age level =40. The same equation can be used to compute the effect of Hpylori at an age level = 50 (c_age=25). Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 52 E ( y1 ) E ( y0 ) = 1 + 25 . So, the contrast of Hpylori depends on the level of age. We can compute the difference of differences (as an interaction can be interpreted). That is the difference of the effects of Hpylori at different ages. (1 + 25) (1 + 15) = 10 . In terms of the graphic, this solution means that as age increases the difference between the two lines increases, and going from 40 to 50 years, that difference increases by 10*0.00256 = 0.0256 over the 10 years. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 53 Now, we will switch the question, using the same model. What is the effect of age on fibrin in the presence of interaction? E ( y ) = 0 + 1 Hp + 2c _ age + * Hp * c _ cage First, compare age 40 vs 50 at Hpylori = 0 Age 40: 0 + 152 Age 50: 0 + 252 , and the difference is 10 2 Next, compare age 40 vs 50 at Hpylori=1 (0 + 1 + 252 + 25) (0 + 1 + 152 + 15) = 10* (2 + ) So, to interpret the effect of age on y, must look at the coefficients for both age and the interaction term, and we must multiply both of them by the difference in age (10 years) In terms of the plots, this difference in the effect by age represents the ever widening gap between the two lines as age increases. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 54 Moral of the story on interaction: (1) It is easy to perform the analysis in Stata using xi syntax (2) It is easy to test whether the interaction terms are statistically significant (3) It is much harder to interpret the results in a manner that makes clinical sense. (4) Interpretation requires writing out the equation and figuring out the contrast of interest (5) Lincom can compute the contrast and confidence interval easily Another caution: Interactions can appear to be statistically significant, but they can also have no clinical sense. One should pre-specify interactions as part of the study protocol (whether observational or RCT). Data dredging can lead to spurious interactions --can occur when there are plenty of data (power) Clinical considerations should prevail, and the model should make clinical sense. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 55 4.0.7.1 Computing Fitted Values Simply Most of literature on linear regression involves interpreting slopes (parameter estimates) and their confidence intervals, to estimate differences in groups controlling for other covariates What is the clinical value of these estimates? Does the audience for the study want estimates of differences in groups or does it want fitted values for the outcome (y) and confidence intervals of those fitted values? Stata offers simple solution adjust, but user must know what the program is generating. adjust is a post estimation command. It is invoked directly after the reg command, and applies as long as there is no other intervening regression. There are two parts of the syntax by which one can specify values of the factors (xs). (1) after adjust one can specify a value of x= -- this is preferred But if the covariate is NOT specified, then it takes on the mean value for that covariate for the group specified under the other part of the syntax. So, if you want to specify fitted values by gender, and age is a factor on which to adjust, and the command is not age=50 for example, then reported fitted values will be for M or F and for the means of the ages for each gender, not at the overall mean age. (2) the by( ) option - for asking for the fitted values for categorical factors or combinations of those factors Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 56 Example: fibrinogen in two groups of patients defined by Hpylori status, controlling for patient age. In Stata: the fitted values of fibrinogen by Hpylori status, for patients age=25 First runthe reg command for a model that can have interactions terms or not. Then use adjust . adjust age=25, by(Hpylori) ci format(%7.4f) Dependent variable: fibrin Command: fit Covariate set to value: age = 25 ----------+----------------------------------Hpylori | 0/1=yes | xb lb ub ----------+----------------------------------0 | 2.3851 [2.2381 2.5320] 1 | 2.4446 [2.3002 2.5890] Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] But if age=25 were omitted, the resulting values of fibrinogen would be at the mean age for Hpylori=0 and the mean age for Hyplori=1. To adjust for the mean age, then the syntax would be: adjust age, by(Hpylori) ci format(%7.4f) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 57 So, the safest way to arrive at fitted values using adjust is to specify each of the variables that are not in the by( ) portion of the syntax. Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 58 4.0.7.2 Width of confidence intervals for adjusted (or fitted values) What happens to the size of the confidence interval of an adjusted value when the covariate changes values? It is not simple. Example: Fibrinogen, Hpylori and age Assume interest in the expected values of fibrinogen by age, given Hpylori=0. One can estimate this by using the command: . adjust Hpylori=0, by(age) ci Dependent variable: fibrinogen Variable left as is: c_age Covariate set to value: Hpylori = 0 Command: regress Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 59 age | xb lb ub ----------+----------------------------------25 | 2.38505 [2.23811 2.53199] 26 | 2.40209 [2.25806 2.54612] 27 | 2.41912 [2.27792 2.56033] 28 | 2.43616 [2.2977 2.57462] 29 | 2.45319 [2.31739 2.589] * * * 40 | 2.64058 [2.52643 2.75473] 41 | 2.65762 [2.5446 2.77064] * * * 48 | 2.77687 [2.66697 2.88676] 49 | 2.7939 [2.68374 2.90406] 50 | 2.81094 [2.70034 2.92153] 51 | 2.82797 [2.71676 2.93918] * * * 70 | 3.15164 [3.00193 3.30136] 71 | 3.16868 [3.01592 3.32144] 72 | 3.18571 [3.02984 3.34159] 73 | 3.20275 [3.0437 3.3618] 74 | 3.21979 [3.0575 3.38207] ---------------------------------------------- WIDTH of CI 0.29 0.22 0.32 What is happening? The width of the CI begins wider and gets narrower and then widens. This happens even though the variance of the estimate of age is NOT changing with age. You have one set of estimates and one set of variances for the entire model. As one gets farther away from the center of the data, the width of the CI of the E(y) will widen somewhat when are looking at linear regression and continuous factors (such as age). Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 60 [Advanced materials] Why? We can compute the confidence interval by invoking the simple mathematics of variances. If Hpylori =0, we have the simple equation: var{E( y)] = var(0 + 1age) = var(0 ) + var(1age) + 2cov(0 , 1age) = var(0 ) + age 2 var(1 ) + 2 age cov(0 , 1 ) Eq(II.1.1) We can compute the variance covariance matrix in Stata (But you do not need to know how to do this). . mat V=e(V) . mat list V symmetric V[3,3] Hpylori Hpylori .00462145 c_age -.00003252 _cons -.00240752 (Stata generates e(V) after regress and other model commands) c_age 5.087e-06 -.00011209 _cons .00559373 And then we can put in values for c_age (=age-25) Copyright 2006, Trustees of the University of Pennsylvania EP 521, Spring 2007. Vol II, Part 1 61 For age=50, for example, c_age= 25, and we have from Eq(II.1.1) 0.005594 + 25*25*5.087/1000000 + 2*25* (-0.000112)= 0.003174 Taking the square root: se[E(y)] = 0.0563 The confidence interval width = 2*1.96*0.0563 = 2*0.11 = 0.22 This matches the large table for age=50 (which is the same as c_age=25) What happens when age is smaller? For age=25, c_age=0, and the calculation is very simple because only the variance of the intercept remains. All other terms drop out of E(y) and therefore: Var[E(y)]= 0.005594, se[E(y)] = 0.075, and CI width = 0.075*1.96*2 = 0.29. The CI is wider What happens when age is larger? For age=75, c_age=50 Var[E(y)] = 0.005594 + 50*50*5.087/1000000 + 2*50 *(-0.000112) = 0.0071 se[E(y)] = 0....

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - EPI - 521
EP 521 Spring 2007, Vol II, Part 9 (Under development) 10 Propensity Scores (balancing scores) 10.1 Potential outcomes, confounding, and conditional independence Problem: In randomized studies: when there are two groups, treated and control: We rely
UPenn - EPI - 521
EP 521, Spring 2007, Vol II, Part 719 Survival Analysis 9.1 Survival and hazard functions 9.2 Survival data and censoring 9.3 Estimating survival functions 9.3.1 Life Table method 9.3.2 Kaplan-Meier method 9.4 Competing risks 9.5 Noninformative c
UPenn - EPI - 521
EP 521, Spring 2007 Vol II, Part 517Other generalized linear regression models for epidemiology We have focused on logistic regression and linear regression. But these are special case of generalized linear models: Ordinary least squares reg
UPenn - EPI - 521
EP 521 Spring 2007 Vol I, part 312.2 Stratified AnalysesMethods and formulae How do we analyze data in the presence of confounding (or effect modification)? This section focuses on Mantel Haenszel methods for stratified analysis of binary outcome
UPenn - EPI - 521
EP 521, Spring 2007, Vol II, Part 819.9 Cox (proportional hazards) multivariable survival methods We have examined the basic and principal methods of handling survival data: life tables, KaplanMeier estimates, and the log-rank test. These methods
UPenn - EPI - 521
EP 521 Spring 2007, Vol II, Part 21Regression Methods for binary outcomes (logistic regression) 4.1 Background 4.2 Logistic regression properties of the model 4.3 Logistic regression Use of the model 4.4 Likelihoods and likelihood ratios 4.5 Li
UPenn - EPI - 521
EP 521 Spring, 2004, Vol I, Part 513.Sample Size Estimation A key to study design are sample size or power calculations. Required of ever grant proposal In this section: (1) we begin with theory behind power calculations and demonstrate how sim
UPenn - BSTA - 790
Noncompliance in randomized trials Frequently in randomized trials, subjects do not comply with their assigned treatment regimen Examples: Health Insurance Plan (HIP) trial of screening for breast cancer (BC) 2 arms: control: no screening screening
UPenn - BSTA - 652
Reference: Agresti, Chapter 16. Categorical data are measured using a limited number of valuesor categories. Categorical variables may have a natural ordering (ordinal) orthe order may be irrelevant (nominal). They are common in biomedical sc
UPenn - MATH - 103
1. 2.Give an example of a pair of (different) functions that have the same derivative. Find an anti-derivative of each of the following functions: a) f (x) = sin(2x) b) f (x) = x3 x2 c) f (x) = x3.The points A,B,C,D (in some order) are success
UPenn - MATH - 103
UNIVERSITY of PENNSYLVANIA MATHEMATICS DEPARTMENTMathematics 103 Midterm I Fall 2006Your Name:_ Penn ID#_ Your Professor (check one): Crotty Komendarczyk Tapp Your TA: __Instructions: You have 2 hours to complete this examination. Please write
UPenn - MATH - 103
UNIVERSITY of PENNSYLVANIA MATHEMATICS DEPARTMENTMathematics 103 Midterm II Fall 2006Your Name:_ Penn ID#_ Your Professor (check one): Crotty Komendarczyk Tapp Your TA: __Instructions: You have 2 hours to complete this examination. Please write
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - MATH - 500
UPenn - C - 90
U S I N G T H E S A M E S Y S T E M F O R A N A L Y Z I N G AND S Y N T H E S I Z I N G S E N T E N C E SPhillipeRincel*andPaul Sabatier* Bull S.A., CE1)IAG, 68 Route de Versailles, 78430 Louveciennes, France. * CNRS, Groupe Intelligen
UPenn - C - 73
BENTE MAEGAARD-EBBE S P A N G - H A N S S E NSEGMENTATION OF FRENCH SENTENCES1. This paper describes a programme which, by means of a very limited number of criteria, analyses French sentences into principal clauses and subordinate clauses. W
UPenn - J - 99
Computational LinguisticsVolume 25, Number 3Beyond Grammar: An Experience-based Theory of Language Rens Bod(University of Amsterdam) Stanford: CSLI Publications (Lecture notes number 88), 1998, xiii+168 pp; distributed by Cambridge University Pr
UPenn - P - 93
INTEGRATING WITHWORD BOUNDARY IDENTIFICATION SENTENCE UNDERSTANDINGKok Wee GanDepartment of Information Systems eJ Computer Science National University of SingaporeK e n t R i d g e C r e s c e n t , S i n g a p o r e 0511 Internet: gankw@iscs.
UPenn - J - 93
Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and RetrievalPaul S. Jacobs (editor)(Research and Development Center, General Electric Company) Hillsdale, NJ: Lawrence Erlbaum Associates, 1992, viii + 281 pp.
UPenn - C - 90
RECOGNIZINGADVICE, WARNINGS,PROMISESAND THREATSKevin Donaghy School of Computer Science and Information Technology Rochester Institute of Technology, Rochester, New York 14623 hkd@cs.rit.eduIt is argued here that utterances in the imperative m
UPenn - P - 83
Crossed S e r i a l Dependencies: i low-power parseable extension to GPSG Henry Thompson Department of Artificial Intelligence and Program in Cognitive Science U n i v e r s i t y of Edinburgh Hope Park Square, Meadow Lane Edinburgh EH8 9NW SCOTLAND
UPenn - P - 96
Using textual clues to improve metaphor processingSt6phane FerrariLIMSI-CNRS P O B o x 133 F-91403 Orsay cSdex, FRANCE ferrari@limsi.frAbstract In this paper, we propose a textual clue approach to help metaphor detection, in order to improve the
UPenn - C - 88
Vi~tcenza I'~I~}NATARO PaL f. Lingaisfik u.Literattn,viss. Universitht Bielefeld PostfN~h 8640 D-4g0b~ Bielefeld 1 .4~&lt;.x;~&gt;~: The aim of the presenteA rc~:~ffeh is the dt~velop ~:~-~i: ~f a lh~gaisdc mo.del of the function01 cont~pts topic and ~i;,
UPenn - P - 84
SOME L I N G U I S T I CASPECTSFOR A U T O M A T I CTEXTUNDERSTANDINGInstituteYutaka Kusanagi of Literature and Linguistics University of Tsukuba Sakura-mura, Ibarakl 3 0 5 JAPANABSTRACTThis paper proposes a s y s t e m of mapping clas
UPenn - T - 78
The Relation of Grammar to Cognition-a Synopsis Leonard Talmy Program in Cognitive Science / Center for Human Information Processing / UC San DiegoAbstract A sentence (or other portion of discourse) is taken to evoke in the listener a meaning compl
UPenn - P - 84
REPRESENTINGKNOWLEDGE ABOUT KNOWLEDGE AND MUTUAL KNOWLEDGE Sald SoulhiEquipe de Comprehension LSI-du Raisonnement UPSNaturelllg route de Narbonne 31062 Toulouse - FRANCEABSTRACT In order to represent speech acts, in a multi-agent context
UPenn - P - 84
TRANSFER IN A MULTILINGUALMT SYSTEMSteven Krauwer &amp; Louis des Tombe Institute for General Linguistics Utrecht State University Trans 14, 3512 JK Utrecht, The NetherlandsABSTRACT In the context of transferbased MT systems, the nature of the inte
UPenn - E - 87
STRING-TREE CORRESPONDENCE GRAMMAR: A DECLARATIVE GRAMMAR FORMALISM FOR DEFINING THE CORRESPONDENCE BETWEEN STRINGS OF TERMS AND TREE STRUCTURES YUSOFF ZAHARIN Groupe d'Etudes pour la Traduction Automatique B.P. n 68 Universit~ de Grenoble 38402 SAI
UPenn - C - 88
E x p r e s s i n g q u a n t i f i e r s c o p e in F r e n c h g e n e r a t i o nPierre-Joseph G A I L L Y * Computer Science D e p a r t m e n t , U n i v e r s i t y of Liege, B4000 Li~ge~ B e l g i u mAbstractIn this paper we propose a new
UPenn - C - 00
Automatic Refinement of a POS Tagger Using a Reliable Parser and Plain Text CorporaHideki Hirakawa, Kenji Ono, Yulniko Yoshimura Human Interface Laboratory Corporate Research &amp; Development Center Toshiba Corporation Konmkai-Toshiba-cho 1, Saiwai-ku,
UPenn - MATH - 114
UPenn - EXAM - 1
Information about the First Hour exam for Math 114-001 Mechanics of the exam:(1) The exam will begin on Monday, Feb. 4, in our usual room, DRL A8, at the usual time (11:00 am). It will be 45 minutes long to give us time to collect the exams before e
UPenn - EXAM - 2
Math 114-001: Hour Exam 2 KeyFeb. 22, 2008Multiple choice questions (10 points each). (1) The curves in R3 dened by r(t) = (t2 , sin(t), t3 ) and h(t) = (sin(t), t, t2 ) intersect at the point (0, 0, 0) when t = 0. What is the angle in radians bet
UPenn - EXAM - 2
Information about the Second Hour exam for Math 114-001 Mechanics of the exam:(1) The exam has been moved to Friday, Feb. 22, in our usual room, DRL A8, at the usual time (11:00 am). It will be 45 minutes long to give us time to collect the exams be
UPenn - EXAM - 3
UPenn - EXAM - 4
Information about the Fourth Hour exam for Math 114-001 Mechanics of the exam:(1) There will be a review session from 8 p.m. to 10 p.m. the evening of Thursday, April 24, in room A5 of DRL labs. (2) The exam will be on Monday, April 28, in our usual
UPenn - EXAM - 4
Math 114-001: Hour Exam 411 a.m., April 24, 2008Your Name: Your T.A. and recitation time: Instructions: This exam is 45 minutes long. You can use one handwritten one-sided page of notes, but no books or calculators. It is important that you show
UPenn - EXAM - 3
Information about the Third Hour exam for Math 114-001 Mechanics of the exam:(1) There will be a review session from 8 p.m. to 10 p.m. the evening of Thursday, May 27, in room A5 of DRL labs. (2) The exam will be on Monday, March 31, in our usual ro
UPenn - STRINGPHEN - 2008
General Analysis of LARGE Volume Scenarios with String Loop Moduli StabilisationMichele CicoliDepartment of Applied Mathematics and Theoretical Physics University of Cambridge SP08, Upenn, 29 May 2008 Based on: 1) M. Cicoli, J. Conlon, F. Quevedo a
UPenn - MATH - 170
Solutions to Problem Set #2 (logic and voting systems)Q 1. Negate the sentence: For every vote Senator Specter gets, he has to spend $10 or one hour of volunteer time. Answer We can let x be a vote that Specter gets, and P (x) = Specter spends $10
UPenn - MATH - 170
Eve Mayer March 19, 2003Mathematics 170 Project: Standoff at Fort SumterThe American Civil War, which lasted from 1861 to 1865, began with a standoff at Fort Sumter, South Carolina. The fort, located in the Charleston harbor, was a federal base c
UPenn - MATH - 170
The Israeli Palestinian ConflictPost Oslo AccordsMeira Levinson Math 170 3/ 19/ 03Historical Background After many years of conflict and war between Israelis and Arabs, hope for peace finally seemed possible with the signing of the Oslo Accor
UPenn - MATH - 170
Aaron Searson Math 1703/23/03 Dr. PrestonRussia Chechnya Conflict Russia and Chechnya have been in conflict since 1994. Chechnya is demanding territorial sovereignty by Russia, and Russia is refusing. The two parties fought a war from 1994-1996,
UPenn - MATH - 170
The French and American Quasi-War of 1797-1800Brian Savage March 20, 2003The French see Jays Treaty between the United States and Britain as a signal that the US supports Britain in the ongoing war against France. Upon John Adams victory in the 17
UPenn - MATH - 170
Scott Reich Ari Goldman Abstract of ProjectMath 170 Dr. PrestonThe American Civil War was one of the largest conflicts in our nations history. Never before had the people of the United States been so divided over a set of issues with such profoun
UPenn - MATH - 170
Andrea Herrero, Julie Rifkin, and Stephanie BuswellU.S. Civil War: A Discussion of the Decision TreeOverview: When Abraham Lincoln was elected in 1861, the South knew that he was a strong anti-slavery advocate (an abolitionist). As the South viewe
UPenn - MATH - 170
The beginning of this escalation came during the Napoleonic Wars, when France instituted the Continental system and Britain passed the Order in Council, both of which seriously threatened the United States commerce. Jealous because the new kid on the
UPenn - MATH - 170
The Cuban Missile CrisisSituation: In 1962, three years after Fidel Castro seized power in Cuba and installed a Marxist regime that was favored by the Soviet Union, Primer Khrushchev and other members in the Kremlin decided to move forward with the
UPenn - MATH - 170
Ryan Pisarri Jon Kluft Math 170CONTEXT: The U.S Contra vs. Nicaraguan Sandinista conflict of the 1980s has to be examined in the context of widespread Cold War paranoia and obviously the concerns the U.S. has for the evil of Communism. Nicaraguas f
UPenn - MATH - 170
Cyprus Conflict of 1967By Linda Chang &amp; Mark ConcepcionMap of CyprusBackgroundPopulation 2 main ethnicities: Greek, TurkishProblems Groups are divided by culture, religion &amp; language Greeks are majority, Turks are minorityLeads to strugg
UPenn - MATH - 170
The Escalation of the Persian Gulf WarRobin Watson Monica SilvestreBackground-Kuwait receives independence from Britain in 1961 leaving the country vulnerable and without military force to protect itself. -July 1990 Iraq, controlled by Sadam Hus
UPenn - MATH - 170
Lauren Pratto 3/20/03 MATH 170 Professor Preston U.S./Colombia Conflict Drugs, Guerillas, and Human Rights After problems in Colombia reached a peak in 1989 with regards to both illegal drug cultivation and drug trafficking as well as leftwing gueri
UPenn - MATH - 170
I am exploring the possible actions that might take place during the United States second conflict with Iraq. Could there be no bloodshed? Would Iraq ever kick Saddam Hussein out? Come explore the possibilities with me in the wonderful world of confl
UPenn - MATH - 170
Sherri Cohen Rachel Moskowitz Dena Weisberg Professor Stephen Preston Math 170 3.19.03The Arab-Israeli War of 1948Please refer to decision trees while reading the explanations. Actual course of the war In response to the increasing desire of the i
UPenn - MATH - 170
Meredith Gamer Math 170 Ideas in Math Escalation Project 03.20.03 The Cuban Missile Crisis occurred in October of 1962 when the American government found out that the USSR was secretly building missile bases in Cuba. America would have to respond to
UPenn - MATH - 170
Palestine-Israel ConflictElizabeth Ivester Jennifer Linden Jennifer PriceIntroduction For hundreds of years, Israelites and Palestinians have warred against each other in order to gain control of Jerusalem and the areas surrounding the city (it i
UPenn - MATH - 170
Kuwait-Iraq Conflict A Brief Overview by Kia Holifield and Leconie ArcherMajor Parties Involved: United States-George Bush Iraq- Saddam Hussein Kuwait- Sheik Jaber al-Ahmed al Sabah United Nations-Javier Perez de Cuellar Minor Involvement: Israel C
UPenn - MATH - 170
The Cuban Missile CrisisYael Barzideh Audrey Hutt March 20, 2003 Math 170 Dr. PrestonHistorical Background In the continual escalation of the Cold War and the arms race, the United States had nuclear missiles in Turkey. In an effort to transform