EXST7015 Fall2011 Lect15

EXST7015 Fall2011 Lect15 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 65 We fit the Reduced model and the Full model (as two separate models). Full model results: dfError = 524 + 960 = 1484, SSE = 304949 + 799234 = 1104183 Reduced model results: dfError=1487, SSE=1206036.845 Then we set up the table below to test the difference. Source Reduced model Full model Difference Full model d.f. 1487 1484 3 1484 SSE MSE 1206036.85 1104183.01 101853.84 33951.2791 1104183.01 744.0586 F 45.6298 P>F 3.320764E‐28 In this case we would decide that there was clearly a difference between genders. We don’t know which one or more of the 3 parameters is different (different curvature or different intercepts) but some difference exists. Later, with analysis of covariance, we could determing which parameters differ. It actually turns out to be the intercept, the curvatures are the same for both genders. So, given the curvature, there is an intermediate age that runs the 10 K race fastest, and younger and older individuals take longer. What is that age? The fitted model for females is Time = 270.94 – 1.7668Age + 0.02906Age2 If we take the first derivative and set this equal to zero, and solve for Age we get: Age at minimum time = 1.7668 / 2(0.02906) = 30.4 Using the equation to solve for the average time at age = 30.4 we get a mean of 244 minutes for women, the best average time for any age. The fitted model for males is Time = 265.60 – 2.3003Age + 0.03392Age2 Men had a minimum at 33.9 and had a mean time of 226.6 minutes at that age. Do the results seem worthwhile? Are they meaningful? Are they Interpretable? Do they have value? Note that the R2 for females was only 0.026822 and for males only 0.038138. However, linear and quadratic terms were significant indicating that there is a significant fit to the means. Polynomial Regression Summary Polynomial regressions are treated like any other multiple regression, except that we use Type I SS for testing hypotheses. Note that the FULLY ADJUSTED regression coefficients are still used to fit the model. The ability to determine a minimum or maximum point is a useful application of polynomials (optimum performance @ age, optimum yield @ fertilizer level, etc). We have some new capabilities as far as what we can do with regression. Test for a curvilinear relationship between the Y and X. Test if the curvature is Quadratic? Cubic? Quartic? ... We can now obtain a curvilinear predictive equation for Y on X. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 66 Response surfaces Polynomials are of interest as an extremely flexible method of curve fitting, even though there are some severe restrictions on the interpretation and predictive ability (outside range) of the model. The ones we have looked at employed only one independent variable. Can you have several? YES, this is a “response surface”. For example, take 2 independent variables. We would include not only quadratic terms (and maybe cubic, etc), but also INTERACTIONS. 2 Yi b0 b1 X 1i b2 X 12i b3 X 2i b4 X 2i b5 X 1i X 2 i ei Shape varies with size and sign of the coefficients (bi) both concave produces a “bowl” shape both convex yields a “hill” shape one of each gives a “saddle” shape The interaction term allows for “twisting” of the surface. Examples 50 A 40 30 20 10 0 1 3 X1 5 7 92 6 18 14 10 X2 45 40 35 30 25 20 15 10 5 0 50 B 40 30 20 C 10 1 3 X1 5 7 92 6 18 14 10 X2 13 57 18 1014 9 6 2 0 A = Both dimensions Concave and Symmetric (no interaction) B = Both dimensions Concave and Asymmetric (interaction present); surface appears “twisted” C = One dimension Concave and one convex and Asymmetric (interaction present) Larger models can be fitted but are not common. 2 3 2 Yi b0 b1 X 1i b2 X 12i b3 X 2i b4 X 2 i b5 X 1i X 2 i b6 X 13i b7 X 2 i b8 X 1i X 2 i b9 X 12i X 2 i ei Statistics quote: I always find that statistics are hard to swallow and impossible to digest. The only one I can remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable. -- Mrs. Robert A. Taft James P. Geaghan - Copyright 2011 Statistical Techniques II Page 67 Regression on an indicator variable What is an indicator variable? It is a variable with either the value 0 or 1. When we get to ANOVA we will see that class variables (categorical, group) are usually reparameterized in parametric analyses as the value 0 or 1. With only two levels (True-False, Up-Down, Male-Female, Marked-Unmarked, Heads-Tails) the values are easily re-coded. If there are more levels (say t levels) then we need t–1 indicator variables. But indicator variables are usually independent variables. ANOVA is all about indicator variables as independent variables. Regression on an indicator variable is different Basically it is a simple linear regression where the dependent variable has a value of either 0 or 1. This is called a binary response. 1 0 This is a “primitive” version of regression on an indicator variable. The predicted value (Yhat) is interpreted as the probability of getting a 1. However, this line will go below zero and will go above 1. This makes the properties somewhat undesirable. The Logistic Model is a rather complicated model, it is not linear and cannot be fitted with PROC REG or PROC GLM. b1 e . This equation has many applications b1 b2 b3 Xi i e 1 b1 and is often used as a “growth” model. The Logistic Growth Equation is Yi ( ) The Logistic Curve 100 1 80 Y 60 40 20 0 0 10 20 30 40 50 X 0 Wouldn't it be nice if we could fit this model to our indicator variable? James P. Geaghan - Copyright 2011 Statistical Techniques II Page 68 Enter Logistic Regression. The big, nonlinear Logistic model can be much simplified if we know that b1=1 and that b2=0. This is basically what logistic regression does, it fits a logistic curve that goes from 0 to 1 asymptotically (so it never really reaches either one). The result of this simplification is a simple linear regression. The dependent variable is a transformation of the original probabilities called the “log odds”. What are exactly are “odds”? Odds are an expression of the likelihood of some event happens compared to the likelihood that it does not happen. If the odds on a horse in a race are 30 to 1, is that horse likely to win? Or lose? If the chance of an event happening are 50:50, what does that mean? What if the likelihood of two events are 1:1? How is that different? We will work with a number that is in some ways a little simpler. We will define odds as the ratio of the probability that an event occurs and the probability of the same event does not occur. Since the probabilities add to 1, if the probability of an event is “p”, then the probability of not occurring is “1–p”. So “50:50” has an odds of 1.0, and the 1:1 are also 1.0. They have the same odds. If the odds are 1, then the likelihood of something happening is exactly equal to the likelihood that it will not happen. That is p = 1 – p. To simplify our concepts we will think of the odds as the ratio of two probabilities. The probability that some event happens (success) will be equal to p. The probability of failure will be 1–p. The odds are calculated as p/(1–p). What if the odds are 2? This means that p is twice as large as (1–p), so success is twice as likely as failure. If the odds are 10 the probability of success is 10 times more likely than failure. If the odds are 0.5 or 0.1, then the probability of failure is twice as likely as or ten times more likely than the probability of success. So we start with probabilities of a binary event. We calculate the odds of one of the events by dividing the probability of that event by the probability of the alternative event. Finally, the natural logarithm of this ratio is calculated to produce the dependent variable of a linear the model (which may be simple linear or multiple). Predicted values from this model are “log odds”. An antilog is taken (elog odds) to get the predicted odds. The odds can be transformed back into probabilities of the event by calculating “probability of event = predicted odds / (1 + predicted odds)”. Let’s look at an example. What are the odds of getting an A in this course given a particular score on the first exam? Disclaimer 1: The use of this example calculated from past grades in no way implies any promise or commitment about the distribution of future grades. Disclaimer 2: Although I will discuss only the number of A's, the non-A's are not all B's, there have been C's and D's. Sorry. Now, what are the odds of getting an A in this course? I have had 423 students take the course since I have been giving two exams in this course (previously 3). Of those 423, there have been 212 A's, we will call this “success”. The odds of getting an A then are almost 50:50, and the odds are just about 1. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 69 Interesting, but we can carry this one step further. What are the odds of getting an A if you had a 70 on the first exam? Or an 80? Or a 90? We will do Logistic regression to determine this. We will use SAS PROC Logistic. The structure is very similar to PROC REG. See the PROC LOGISTIC output (Appendix 10) The output is rather different because the analysis is not a least squares regression. You will be responsible only for interpreting the values specified below. Note that SAS states that the link is a LOGIT. A logit is the natural logarithm of the odds. That is “loge(p / (1–p))”. SAS gives the number of each of the binary cases, in our example there are 212 “TRUE” and 211 “FALSE”. The output contains a section called the Model Fit Statistics (AIC, SC and –2LogL). These statistics are used to compare 2 models. We will not cover them here. Other output of interest is the “Testing Global Null Hypothesis: BETA=0”. This is like the test of the model in ordinary regression. Of special interest is the “Likelihood Ratio Test”, not so much the “Score” and “Wald” tests The Parameter Estimates and their test are given in the section called “Analysis of Maximum Likelihood Estimates” Do how do we interpret this output? We are interested primarily in the slope and intercept, and the test of the slope. Intercept = –16.91 and Slope = 0.1952 Likelihood ratio test P-value <0.0001 So we have a highly significant slope. The interpretation here is the same as for regression, though the test statistic is different. Now, we have a slope and an intercept. Could we get predicted values as with regression? Absolutely! However, we entered log-odds into the model, so the predicted values are log-odds. We can get these predicted values and convert to probabilities. Calculate the probability that a person with an 83 will get an A in the course. Yhati = –16.91 + 0.1952Xi = –16.91+ 0.1952(83) = –0.7086 This is a log odds, first take the antilog. exp(– 0.7086) = 0.4923 So, p/(1–p) = 0.4923, then solving for p we get p = 0.4923/(1+ 0.4923) = 0.3299. The probability of getting an A with a grade of 83 on the first exam is 0.33, or we could figure that about 33% of students with a grade of 83 on the first exam will get an A. The predicted values, detransformed to probabilities, will form a logistic regression line going from 0 to 1. 1.00 0.75 P(A) 0.50 0.25 0.00 50 60 70 80 90 Grade on first exam 100 James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online