EXST7015 Fall2011 Lect16

EXST7015 Fall2011 Lect16 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 68 Enter Logistic Regression. The big, nonlinear Logistic model can be much simplified if we know that b1=1 and that b2=0. This is basically what logistic regression does, it fits a logistic curve that goes from 0 to 1 asymptotically (so it never really reaches either one). The result of this simplification is a simple linear regression. The dependent variable is a transformation of the original probabilities called the “log odds”. What are exactly are “odds”? Odds are an expression of the likelihood of some event happens compared to the likelihood that it does not happen. If the odds on a horse in a race are 30 to 1, is that horse likely to win? Or lose? If the chance of an event happening are 50:50, what does that mean? What if the likelihood of two events are 1:1? How is that different? We will work with a number that is in some ways a little simpler. We will define odds as the ratio of the probability that an event occurs and the probability of the same event does not occur. Since the probabilities add to 1, if the probability of an event is “p”, then the probability of not occurring is “1–p”. So “50:50” has an odds of 1.0, and the 1:1 are also 1.0. They have the same odds. If the odds are 1, then the likelihood of something happening is exactly equal to the likelihood that it will not happen. That is p = 1 – p. To simplify our concepts we will think of the odds as the ratio of two probabilities. The probability that some event happens (success) will be equal to p. The probability of failure will be 1–p. The odds are calculated as p/(1–p). What if the odds are 2? This means that p is twice as large as (1–p), so success is twice as likely as failure. If the odds are 10 the probability of success is 10 times more likely than failure. If the odds are 0.5 or 0.1, then the probability of failure is twice as likely as or ten times more likely than the probability of success. So we start with probabilities of a binary event. We calculate the odds of one of the events by dividing the probability of that event by the probability of the alternative event. Finally, the natural logarithm of this ratio is calculated to produce the dependent variable of a linear the model (which may be simple linear or multiple). Predicted values from this model are “log odds”. An antilog is taken (elog odds) to get the predicted odds. The odds can be transformed back into probabilities of the event by calculating “probability of event = predicted odds / (1 + predicted odds)”. Let’s look at an example. What are the odds of getting an A in this course given a particular score on the first exam? Disclaimer 1: The use of this example calculated from past grades in no way implies any promise or commitment about the distribution of future grades. Disclaimer 2: Although I will discuss only the number of A's, the non-A's are not all B's, there have been C's and D's. Sorry. Now, what are the odds of getting an A in this course? I have had 423 students take the course since I have been giving two exams in this course (previously 3). Of those 423, there have been 212 A's, we will call this “success”. The odds of getting an A then are almost 50:50, and the odds are just about 1. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 69 Interesting, but we can carry this one step further. What are the odds of getting an A if you had a 70 on the first exam? Or an 80? Or a 90? We will do Logistic regression to determine this. We will use SAS PROC Logistic. The structure is very similar to PROC REG. See the PROC LOGISTIC output (Appendix 10) The output is rather different because the analysis is not a least squares regression. You will be responsible only for interpreting the values specified below. Note that SAS states that the link is a LOGIT. A logit is the natural logarithm of the odds. That is “loge(p / (1–p))”. SAS gives the number of each of the binary cases, in our example there are 212 “TRUE” and 211 “FALSE”. The output contains a section called the Model Fit Statistics (AIC, SC and –2LogL). These statistics are used to compare 2 models. We will not cover them here. Other output of interest is the “Testing Global Null Hypothesis: BETA=0”. This is like the test of the model in ordinary regression. Of special interest is the “Likelihood Ratio Test”, not so much the “Score” and “Wald” tests The Parameter Estimates and their test are given in the section called “Analysis of Maximum Likelihood Estimates” Do how do we interpret this output? We are interested primarily in the slope and intercept, and the test of the slope. Intercept = –16.91 and Slope = 0.1952 Likelihood ratio test P-value <0.0001 So we have a highly significant slope. The interpretation here is the same as for regression, though the test statistic is different. Now, we have a slope and an intercept. Could we get predicted values as with regression? Absolutely! However, we entered log-odds into the model, so the predicted values are log-odds. We can get these predicted values and convert to probabilities. Calculate the probability that a person with an 83 will get an A in the course. Yhati = –16.91 + 0.1952Xi = –16.91+ 0.1952(83) = –0.7086 This is a log odds, first take the antilog. exp(– 0.7086) = 0.4923 So, p/(1–p) = 0.4923, then solving for p we get p = 0.4923/(1+ 0.4923) = 0.3299. The probability of getting an A with a grade of 83 on the first exam is 0.33, or we could figure that about 33% of students with a grade of 83 on the first exam will get an A. The predicted values, detransformed to probabilities, will form a logistic regression line going from 0 to 1. 1.00 0.75 P(A) 0.50 0.25 0.00 50 60 70 80 90 Grade on first exam 100 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 70 Interpretation of the slope. The analysis is log transformed, so to get the slope back on the original scale we need to take an antilog (e.g. exp(0.1952) = 1.2156). As with other slopes, this is the change in Y per unit X, or the change in the odds for an increase of one point in the grade. So the odds go up by 21 percent for each additional point in grade, but Don't confuse the increase in odds with the increase in probability. Also, don't confuse the change in the odds (odds ratio) with the odds . The increase of 1.2156 is an odds ratio, and gives the increase in the odds (per unit change in Xi), not the ratio of the increase in the probability (see the calculations below). Grade 70 71 80 81 90 91 Odds 0.03892 0.04731 0.27412 0.33320 1.93054 2.34668 Probability 0.03746 0.04517 0.21514 0.24993 0.65877 0.70120 Ratio of values of P Odds ratio 1.2058 1.21556 1.1617 1.21556 1.0644 1.21556 So the slope is an odds ratio, the proportional change in the odds (per unit change in Xi). SAS provides tests and confidence intervals for this value. The analysis as mentioned is NOT a least squares regression, it is actually a weighted maximum likelihood estimation carried out on the logit values. Multiple logistic regression is perfectly feasible, and SAS also has the stepwise analyses and “best models” selection you are familiar with. The best model selection is not called RSquare, it is the SCORE option (based on chi square, not RSquare). CLASS variables can be included in the analysis. We will discuss these in ANOVA. Summary Regression on an indicator variable is similar in concept to ordinary least squares regression, but differs considerably in the execution. The analysis is called Logistic Regression. The slope and intercept are used in regression in a way that is similar to ordinary least squares, but the value predicted is a log of the odds. This can be converted to probabilities. SAS provides statistics to evaluate the significance of the fit, and a confidence interval for the estimate of the slope. SAS also provides other options for various selection techniques and the addition of class variables. Statistics quote: If at first it doesn't fit, fit, fit again. -- John McPhee James P. Geaghan - Copyright 2011 Statistical Techniques II Page 71 Analysis of Covariance Linear regression is usually done as a least squares technique applied to QUANTITATIVE VARIABLES. ANOVA is the analysis of categorical (class, indicator, group) variables, there are no quantitative “X” variables as in regression, but this is still a least squares technique. It stands to reason that if Regression uses the least squares technique to fit quantitative variables, and ANOVA uses the same approach to fit qualitative variables, that we should be able to put both together into a single analysis. We will call this Analysis of Covariance. There are actually two conceptual approaches, Multisource regression – adding class variables to a regression Analysis of Covariance – adding quantitative variables to an ANOVA For now, we will be primarily concerned with Multisource regression. With multisource regression we start with a regression (one slope and one intercept) and ask, would the addition of an indicator or class variable improve the model? Adding a class variable to a regression gives each group its own intercept, fitting a separate intercept to each group. Y X Adding an interaction fits a separate slope to each group. Y X How do they do that? For a simple linear regression we start with, Yi b 0 b1 X 1 i e i Now add an indicator variable. An indicator variable, or dummy variable, is a variable that uses values of “0” and “1” to distinguish between the members of a group. For example, if the category, or groups, were MALE and FEMALE, we could give the females a “1” and the males a “0” and distinguish between the groups. If the groups were FRESMAN, SOPHOMORE, JUNIOR and SENIOR we would need 3 variables to distinguish between them, the first would have a “1” for freshmen and a “0” otherwise, the second and third would have “1” for SOPHOMORE and JUNIOR, respectively, and a zero otherwise. We don’t need a fourth variable for SENIOR because if we have an observation with values of 0, 0, 0 for the three variables we know it has to be a SENIOR. So, we always need one less dummy variable than there are categories in the group. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 72 Fitting separate intercepts In our example we will add just one indicator variable, but it could be several. We will call our indicator variable X2i, but it is actually just a variable with values of 0 or 1 that distinguishes between the two categories of the indicator variable. Yi b 0 b1 X 1 i b 2 X 2 i e i When X2i = 0 we get: Yi b0 b1 X 1 i b2 (0 ) ei , which reduces to Yi b 0 b1 X 1 i e i , a simple linear model for the “0” group. And when X2i = 1 we have Yi b 0 b1 X 1 i b 2 (1) e i , which simplifies to Yi ( b 0 b 2 ) b1 X 1 i e i , Y (b0+b2) b0 X a simple linear model with an intercept equal to (b0+b2) for the “1” group. So there are two lines with intercepts of b0 and (b0+b2). Note that b2 is the difference between the two lines, or an adjustment to the first line that gives the second line. This can be positive or negative, so the second line may be above or below the first. This term can be tested against zero to determine if there is a difference in the intercepts. Fitting separate slopes Adding an interaction (crossproduct term) between the quantitative variable (X1i) and the indicator variable (X2i) will fit separate slopes. Still using just one indicator variable for two classes the model is Y i b 0 b1 X 1 i b 2 X 2 i b3 X 1 i X 2 i e i . When X2i = 0 we get: Y i b 0 b1 X 1 i b 2 (0 ) b 3 X 1 i (0 ) e i , which reduces to Yi b 0 b1 X 1 i e i , a simple linear model for the “0” group. When X2i =1 then Yi b 0 b1 X 1 i b 2 (1) b3 X 1 i (1) e i Y b1+b3 b0+b2 b0 b1 X simplifying to Yi ( b0 b 2 ) ( b1 b3 ) X 1i e i , which is a simple linear model with an intercept equal to (b0+b2) and a slope equal to (b1+b3) for the “1” group. Note that b0 and b1 are the intercept and slope for one of the lines (whichever was assigned the 0). The values of b2 and b3 are the intercept and slope adjustments (+ or – differences) for the second line. If these adjustments are not different from zero then the intercept and or slope are not different from each other. A third or fourth line could be fitted by adding additional indicator variables and interaction with coefficients b4 and b5, etc. Statistics quote: There are two kinds of statistics, the kind you look up and the kind you make up. -- Rex Stout James P. Geaghan - Copyright 2011 Statistical Techniques II Page 73 Conceptual application of Analysis of Covariance Suppose we have a regression, and we would like to know if perhaps the regression is different for some groups, perhaps male and female or treated and untreated. Of course, regression starts with a corrected sum of squares (a flat line through the mean) and we fit a slope. Viewing this as extra sums of squares we are comparing a model with no slope to a model with a slope. a A test of the difference between these two models tests the hypothesis of adding a slope (H0:1=0). The extra SS are SSX1|X0, or just SSX1. From the simple linear regression we want to test the hypotheses that the model is improved by adding separate intercepts or slopes. The concepts are pretty simple. The additional variables can be tested with extra SS or the General Linear Hypothesis Test (GLHT with full model and reduced model). For a single indicator variable the extra SS are easy enough. For several indicator variables it may be easier to do the GLHT where the single line is the reduced model and the two lines are the full model. b A test of the difference between these two models tests the hypothesis of adding an intercept adjustment (H0:2=0) or tests for the difference between the intercepts. The extra SS are SSX2 | X1. The next test compares a model with two intercepts and one slope with a model containing two slopes and two intercepts. c Here we are testing the addition of a slope adjustment, or second slope (H0:3=0). This test determines if a model with 2 intercepts and 2 slopes is better than a model with 2 intercepts and one slope. The extra SS are SSX1X2 | X1 X2. This series a–b–c is the usual “Multisource regression” series. Note that the extra SS needed are: SSX1; SSX2 | X1; SSX1X2 | X1, X2. These are the Type I SS for the SAS model Y=X1 X2 X1*X2;. So this is another of those rare cases where we may wish to use TYPE I tests. The hypotheses to be tested can be simplified to some extent if we know that for our base model we want a regression. This is multisource regression. Start with a simple linear regression. Test to see if separate intercepts improve the model (H0:2=0). Correction factor, 1 level = b0 (fit mean) add b1, 1 level (b0), 1 slope (b1) add b2, 2 levels (b0, b2), 1 slope (b1) add b3, 2 levels (b0, b2) and 2 slopes (b1 and b3) James P. Geaghan - Copyright 2011 Statistical Techniques II Page 74 Test to see if separate slopes improve the model (H0:3=0). As extra SS we could test extra SS for b2 and then the extra SS for b3. In practice we usually start with the fullest model (2 slopes and 2 intercepts) and then reduce to 1 slope and 2 intercepts if the extra SS for b3 is not significant and then to 1 slope and 1 intercept if extra SS for b2 is not significant. So, interpretation usually proceeds from the bottom up; do we need 2 slopes and 2 intercepts, if not do we need 2 intercepts, if not is the SLR significant? If TYPE I sums of squares are used there would be little if any change from deleting the higher order components of the model, so in this case we would not have to refit the model after each decision. The solution (i.e. regression coefficients) provided with ANCOVA is the correct model for prediction, even though it is fully adjusted (Type III) and our tests of hypothesis are not (Type I). The progression discussed above corresponds to the likely series in multisource regression (a-b-c below). In Analysis of Variance (our next topic), the researchers interest is in differences among the means of categorical variables. For this analysis the categorical variable can be put first and the progression is the usual series for Analysis of Variance (d-e-c below). Which series of test you need depends on if you are starting with a regression or an analysis of variance. a d e b f c g Examine the possible full and reduced models on the previous page. What is tested in each case, what extra SS is needed for each test and what are the initial and final models for each case. One other test of occasional interest is the test denoted by the comparison “g”. If you fit a model with two slopes and two intercepts, and cannot reduce to a model with two intercepts and one slope, you may wonder if a model with two slopes and one intercept is appropriate. Full model: Yi b 0 b1 X 1 i b 2 X 2 i b3 X 1 i X 2 i e i Reduced model: Yi b0 b1 X 1i b3 X 1i X 2 i ei The extra SS for this test is SSX2 | X1, X1*X2. This is actually a test of “intercepts fitted last” and would be available as the Type III SS for X1*X2 with the model Y=X1 X2 X1*X2;. g James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online