Fu_Ch11_linear_regression - Chapter 11 Regression and...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 11 Regression and Correlation Regression methods methods EPI 809/Spring 2008 EPI 1 Learning Objectives 1. Describe the Linear Regression Model 2. State the Regression Modeling Steps 1. Explain Ordinary Least Squares 2. Compute Regression Coefficients 3. Understand and check model assumptions 1. Predict Response Variable 2. Comments of SAS Output EPI 809/Spring 2008 EPI 2 Learning Objectives… Learning 1. 1. Correlation Models 2. Link between a correlation model and a Link regression model regression 3. Test of coefficient of Correlation EPI 809/Spring 2008 EPI 3 Models EPI 809/Spring 2008 EPI 4 What is a Model? 1. Representation Representation of Some Some Phenomenon Non-Math/Stats Model EPI 809/Spring 2008 EPI 5 What is a Math/Stats Model? 1. Often Describe Relationship between Often Variables Variables 1. Types - Deterministic Models (no randomness) - Probabilistic Models (with randomness) EPI 809/Spring 2008 EPI 6 Deterministic Models 1. 2. 3. Hypothesize Exact Relationships Suitable When Prediction Error is Negligible Example: Body mass index (BMI) is measure of Example: body fat based body Metric Formula: BMI = Weight in Kilograms Weight (Height in Meters)2 Non-metric Formula: BMI = Weight (pounds)x703 Weight (Height in inches)2 (Height EPI 809/Spring 2008 EPI 7 Probabilistic Models 1. 2. Hypothesize 2 Components • Deterministic • Random Error Example: Systolic blood pressure of newborns Example: Is 6 Times the Age in days + Random Error Is • SBP = 6xage(d) + ε • Random Error May Be Due to Factors Random Other Than age in days (e.g. Birthweight) Other EPI 809/Spring 2008 EPI 8 Types of Types Probabilistic Models Probabilistic Probabilistic Models Regression Models Correlation Models EPI 809/Spring 2008 EPI Other Models 9 Regression Models EPI 809/Spring 2008 EPI 10 Types of Types Probabilistic Models Probabilistic Probabilistic Models Regression Models Correlation Models EPI 809/Spring 2008 EPI Other Models 11 Regression Models Relationship between one Relationship dependent dependent variable and explanatory variable(s) variable explanatory Use equation to set up relationship • Numerical Dependent (Response) Variable • 1 or More Numerical or Categorical Independent or (Explanatory) Variables (Explanatory) Used Mainly for Prediction & Estimation EPI 809/Spring 2008 EPI 12 Regression Modeling Steps Regression 1. Hypothesize Deterministic Component • Estimate Unknown Parameters 2. Specify Probability Distribution of Specify Random Error Term Random • Estimate Standard Deviation of Error 3. Evaluate the fitted Model 4. Use Model for Prediction & Estimation Use EPI 809/Spring 2008 EPI 13 Model Specification EPI 809/Spring 2008 EPI 14 Specifying the deterministic Specifying component component 1. Define the dependent variable and Define independent variable independent 2. Hypothesize Nature of Relationship Expected Effects (i.e., Coefficients’ Signs) Functional Form (Linear or Non-Linear) Interactions EPI 809/Spring 2008 EPI 15 Model Specification Model Is Based on Theory Is 1. 2. 3. 4. Theory of Field (e.g., Epidemiology) Mathematical Theory Previous Research ‘Common Sense’ EPI 809/Spring 2008 EPI 16 Thinking Challenge: Thinking Which Is More Logical? Which CD+ counts Years since seroconversion CD+ counts Years since seroconversion CD+ counts Years since seroconversion CD+ counts Years since seroconversion EPI 809/Spring 2008 EPI 17 OB/GYN Study OB/GYN EPI 809/Spring 2008 EPI 18 Types of Types Regression Models Regression EPI 809/Spring 2008 EPI 19 Types of Types Regression Models Regression Regression Models EPI 809/Spring 2008 EPI 20 Types of Types Regression Models Regression 1 Explanatory Variable Regression Models Simple EPI 809/Spring 2008 EPI 21 Types of Types Regression Models Regression 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Simple EPI 809/Spring 2008 EPI 22 Types of Types Regression Models Regression 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Simple Linear EPI 809/Spring 2008 EPI 23 Types of Types Regression Models Regression 1 Explanatory Variable Regression Models Multiple Simple Linear 2+ Explanatory Variables NonLinear EPI 809/Spring 2008 EPI 24 Types of Types Regression Models Regression 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Simple Linear NonLinear Linear EPI 809/Spring 2008 EPI 25 Types of Types Regression Models Regression 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Simple Linear NonLinear Linear EPI 809/Spring 2008 EPI NonLinear 26 Linear Regression Linear Model Model EPI 809/Spring 2008 EPI 27 Types of Types Regression Models Regression Regression Models 1 Explanatory Variable 2+ Explanatory Variables Multiple Simple Linear NonLinear Linear EPI 809/Spring 2008 EPI NonLinear 28 Linear Equations Y Y = mX + b m = Slope Change in Y Change in X b = Y-intercept X © 1984-1994 T/Maker Co. EPI 809/Spring 2008 EPI 29 Linear Regression Model 1. Relationship Between Variables Is a Relationship Linear Function Linear Population Population Y-Intercept Y-Intercept Population Population Slope Slope Random Random Error Error Yi = β 0 + β 1X i + ε i Dependent Dependent (Response) Variable Variable (e.g., CD+ c.) Independent Independent (Explanatory) Variable (e.g., Years s. serocon.) (e.g., Population & Sample Population Regression Models Regression EPI 809/Spring 2008 EPI 31 Population & Sample Population Regression Models Regression Population EPI 809/Spring 2008 EPI 32 Population & Sample Population Regression Models Regression Population Unknown Relationship Yi = β 0 + β1X i + ε i EPI 809/Spring 2008 EPI 33 Population & Sample Population Regression Models Regression Random Sample Population Unknown Relationship Yi = β 0 + β1X i + ε i EPI 809/Spring 2008 EPI 34 Population & Sample Population Regression Models Regression Random Sample Population Unknown Relationship Yi = β 0 + β1X i + ε i +β X +ε i Yi = β 0 1i EPI 809/Spring 2008 EPI 35 Population Linear Regression Population Model Model Y Yi = β 0 + β1X i + ε i Observed value ε i = Random error Random E( Y ) = β 0 + β1 X i X Observed value EPI 809/Spring 2008 EPI 36 Sample Linear Regression Sample Model Model Y Yi = β 0 + β 1X i + ε i ε^ = Random i error error =β +β X Yi 0 1i Unsampled Unsampled observation observation X Observed value EPI 809/Spring 2008 EPI 37 Estimating Parameters: Least Squares Method EPI 809/Spring 2008 EPI 38 Scatter plot 1. Plot of All (Xi, Yi) Pairs 2. Suggests How Well Model Will Fit 60 40 20 0 Y 0 20 40 EPI 809/Spring 2008 EPI X 60 39 Thinking Challenge How would you draw a line through the How points? How do you determine which line ‘fits best’? 60 40 20 0 Y 0 20 40 EPI 809/Spring 2008 EPI X 60 40 Thinking Challenge How would you draw a line through the How points? How do you determine which line ‘fits best’? ‘fits Slope changed 60 40 20 0 Y 0 20 40 X 60 Intercept unchanged EPI 809/Spring 2008 EPI 41 Thinking Challenge How would you draw a line through the How points? How do you determine which line ‘fits best’? ‘fits Slope unchanged 60 40 20 0 Y 0 20 40 X 60 Intercept changed EPI 809/Spring 2008 EPI 42 Thinking Challenge How would you draw a line through the How points? How do you determine which line ‘fits best’? ‘fits Slope changed 60 40 20 0 Y 0 20 40 X 60 Intercept changed EPI 809/Spring 2008 EPI 43 Least Squares Least 1. ‘‘Best Fit’ Means Difference Between Best Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences OffBut Set Negative ones EPI 809/Spring 2008 EPI 44 Least Squares Least 1. ‘‘Best Fit’ Means Difference Between Best Actual Y Values & Predicted Y Values is a Minimum. But Positive Differences Off-Set But Negative ones. So square errors! So (Y − Yˆ ) = ∑εˆ ∑ n i =1 2 i i n 2 i i =1 EPI 809/Spring 2008 EPI 45 Least Squares Least 1. ‘‘Best Fit’ Means Difference Between Best Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences OffBut Set Negative. So square errors! (Y − Yˆ ) = ∑εˆ ∑ n i =1 2 i i n 2 i i =1 2. LS Minimizes the Sum of the Squared LS Differences (errors) (SSE) Differences EPI 809/Spring 2008 EPI 46 Least Squares Graphically n 2 = ε1 + ε 2 + ε 2 + ε 2 2 2 3 4 LS minimizes ∑ ε i i =1 Y Y2 = β 0 + β 1X 2 + ε 2 ε^ 4 ε^ 2 ε^ 1 ε^ 3 =β +β X Yi 0 1i X EPI 809/Spring 2008 EPI 47 Coefficient Equations Prediction equation ˆ ˆ yi = β 0 + β1 xi ˆ Sample slope SS xy ∑ ( xi − x ) ( yi − y ) ˆ β1 = = SS xx ( xi − x ) 2 ∑ Sample Y - intercept ˆ ˆ β 0 = y − β1x EPI 809/Spring 2008 EPI 48 Derivation of Parameters (1) Least Squares (L-S): Least Minimize squared error Minimize n n ε i2 =∑ ( yi − β 0 − β1 xi ) ∑ i =1 0= ∂∑ ε ∂β 0 2 i 2 i =1 = ∂ ∑ ( yi − β 0 − β1 xi ) 2 ∂β 0 = −2 ( ny − nβ 0 − nβ1 x ) ˆ ˆ β 0 = y − β1x EPI 809/Spring 2008 EPI 49 Derivation of Parameters (1) Least Squares (L-S): Least Minimize squared error Minimize 2 ∂∑ i2 ε ∂∑ yi − β0 − β xi ) ( 1 0= = ∂β 1 ∂β 1 = −2∑xi ( yi − β0 − β xi ) 1 = −2∑xi ( yi − y + β x − β xi ) 1 1 β1 ∑ xi ( xi − x ) = ∑ xi ( yi − y ) β1 ∑ ( xi − x ) ( xi − x ) = ∑ ( xi − x ) ( yi − y ) ˆ = SS xy β1 SS xx EPI 809/Spring 2008 EPI 50 Computation Table Xi X1 Yi 2 Xi Y1 X1 2 2 X2 Y2 X2 : : : Xn Σ Xi 2 XiYi Y1 2 X1Y1 Y2 2 X2Y2 Yi : 2 Yn Xn Σ Yi 2 Σ Xi EPI 809/Spring 2008 EPI : 2 XnYn 2 Σ Yi Σ XiYi Yn 51 Interpretation of Coefficients EPI 809/Spring 2008 EPI 52 Interpretation of Coefficients 1. ^ Slope (β 1) ^ Estimated Y Changes by β 1 for Each 1 Unit Estimated Increase in X ^ • If β 1 = 2, then Y Is Expected to Increase by 2 for If Each 1 Unit Increase in X EPI 809/Spring 2008 EPI 53 Interpretation of Coefficients ^ 1. Slope (β 1) ^1 for Each 1 Unit Estimated Y Changes by β Estimated Increase in X ^ • If β 1 = 2, then Y Is Expected to Increase by 2 for If Each 1 Unit Increase in X ^ 2. Y-Intercept (β 0) Average Value of Y When X = 0 Average ^ • If β 0 = 4, then Average Y Is Expected to Be If 4 When X Is 0 EPI 809/Spring 2008 EPI 54 Parameter Estimation Example Obstetrics: What is the relationship between relationship Mother’s Estriol level & Birthweight using the Mother’s following data? following Estriol Birthweight Birthweight (mg/24h) (mg/24h) (g/1000) 1 2 3 4 5 1 1 2 2 4 EPI 809/Spring 2008 EPI 55 Scatterplot Scatterplot Birthweight vs. Estriol level Birthweight 4 3 2 1 0 0 1 2 3 4 5 6 Estriol level EPI 809/Spring 2008 EPI 56 Parameter Estimation Solution Parameter Table Table Xi Yi Xi2 Yi2 XiYi 1 1 1 1 1 2 1 4 1 2 3 2 9 4 6 4 2 16 4 8 5 4 25 16 20 15 10 55 26 37 EPI 809/Spring 2008 EPI 57 Parameter Estimation Solution n ˆ β1 = n ∑ X i ∑ Yi n i =1 i =1 ∑ X iYi − n i =1 n ∑ Xi n 2 i =1 ∑ Xi − n i =1 2 = (15) (10) 37 − 5 (15) 2 55 − 5 = 0.70 ˆ ˆ β 0 = Y − β1 X = 2 − ( 0.70 ) ( 3) = −0.10 EPI 809/Spring 2008 EPI 58 Coefficient Interpretation Coefficient Solution Solution EPI 809/Spring 2008 EPI 59 Coefficient Interpretation Coefficient Solution Solution 1. ^ Slope (β 1) Birthweight (Y) Is Expected to Increase by .7 Is Units for Each 1 unit Increase in Estriol (X) Units EPI 809/Spring 2008 EPI 60 Coefficient Interpretation Coefficient Solution Solution 1. 2. ^ Slope (β 1) Birthweight (Y) Is Expected to Increase by .7 Is Units for Each 1 unit Increase in Estriol (X) Units ^ Intercept (β 0) Average Birthweight (Y) Is -.10 Units When Is Estriol level (X) Is 0 Estriol • Difficult to explain • The birthweight should always be positive EPI 809/Spring 2008 EPI 61 SAS codes for fitting a simple linear SAS regression regression Data BW; /*Reading data in SAS*/ iinput estriol birthw@@; nput estriol cards; 11 2 1 3 2 42 5 4 ; run; PROC REG data=BW; /*Fitting linear regression /*Fitting models*/ models*/ model birthw=estriol; model birthw run; EPI 809/Spring 2008 EPI 62 Parameter Estimation Parameter SAS Computer Output SAS Parameter Estimates Parameter DF Variable Parameter Estimate 1 1 -0.10000 0.70000 Intercept Estriol ^ β 0 Standard Error t Value 0.63509 0.19149 -0.16 3.66 Pr > |t| 0.8849 0.0354 ^ β EPI 809/Spring 2008 1 EPI 63 Parameter Estimation Thinking Parameter Challenge Challenge You’re a Vet epidemiologist for the county You’re cooperative. You gather the following data: cooperative. Food (lb.) Milk yield (lb.) Milk 4 3.0 6 5.5 10 6.5 12 9.0 What is the relationship What relationship between cows’ food intake and milk yield? between © 1984-1994 T/Maker Co. EPI 809/Spring 2008 EPI 64 Scattergram Scattergram Milk Yield vs. Food intake* Milk M. Yield (lb.) 10 8 6 4 2 0 0 5 10 15 Food intake (lb.) EPI 809/Spring 2008 EPI 65 Parameter Estimation Solution Parameter Table* Table* Xi Yi 2 Xi 2 Yi 4 3.0 16 9.00 12 6 5.5 36 30.25 33 10 6.5 100 42.25 65 12 9.0 144 81.00 108 32 24.0 296 162.50 218 EPI 809/Spring 2008 EPI XiYi 66 Parameter Estimation Solution* n ˆ β1 = n ∑ X i ∑ Yi n i =1 i =1 ∑ X iYi − n i =1 n ∑ Xi n 2 i =1 ∑ Xi − n i =1 2 = ( 32) ( 24) 218 − 4 ( 32) 2 296 − 4 = 0.65 ˆ ˆ β 0 = Y − β1 X = 6 − ( 0.65) ( 8) = 0.80 EPI 809/Spring 2008 EPI 67 Coefficient Interpretation Coefficient Solution* Solution* EPI 809/Spring 2008 EPI 68 Coefficient Interpretation Coefficient Solution* Solution* 1. ^ Slope (β 1) Milk Yield (Y) Is Expected to Increase by . 65 lb. for Each 1 lb. Increase in Food intake 65 (X) EPI 809/Spring 2008 EPI 69 Coefficient Interpretation Coefficient Solution* Solution* 1. 2. ^ Slope (β 1) Milk Yield (Y) Is Expected to Increase by . Milk Yield 65 lb. for Each 1 lb. Increase in Food intake (X) ^ Y-Intercept (β 0) Average Milk yield (Y) Is Expected to Be 0.8 Is lb. When Food intake (X) Is 0 lb. EPI 809/Spring 2008 EPI 70 ...
View Full Document

This note was uploaded on 08/09/2011 for the course STATISTICS -- taught by Professor Montilla during the Spring '11 term at Universidad Iberoamericana.

Ask a homework question - tutors are online