Unformatted text preview: LINEAR REGRESSION: ONE REGRESSOR
Simple linear regression Stock and Watson (Chapters 4 and 5) LINEAR REGRESSION More informative than correlation or covariance. More Recall a positive covariance/correlation indicates a positive relationship; a negative covariance/correlation indicates a negative relationship; a zero covariance/correlation indicates no relationship indicates Linear regression is used when theory or Linear commonsense suggests that one variable affects the other e.g. lot size affects price the Chapters 4 and 5 concentrate on linear regression Chapters models with one regressor models LINEAR REGRESSION (Contd.) Linear regression is the bread and butter of Linear econometrics econometrics LINEAR REGRESSION (Contd.) HEDONIC (PRICE) MODEL: relates the price of an HEDONIC underlying asset (e.g. house) to its characteristics (e.g. lot size). Commonsense: Lot size affects price of a residential house Independent variable is lot size (equivalently, lot size is the Independent regressor) regressor) Dependent variable is price (equivalently, price is the Dependent regressand) regressand) Chapters 4 and 5 concentrate on linear regression models Chapters with one regressor with LINEAR REGRESSION SPECIFICATION: RULES The dependent variable (regressand) appears on The the LHS of the model the The independent variable (regressor) appears on The the RHS of the model the Hence the direction of causation is presumed to be Hence from the RHS variable(s) (the regressor(s)) to the LHS variable (the regressand) LHS SPECIFICATION OF SIMPLE LINEAR REGRESSION MODEL LINEAR REGRESSION: ASSUMPTIONS Read chapter 4 for assumptions Assumptions: Read textbook Chapter 4 (Section 4.4) LINEAR REGRESSION SLOPES AND INTERCEPTS LINEAR REGRESSION: POPULATION REGRESSION LINE B0+b1xi is the population regression line or population regression function. This is the relationship between x and y on average over the population. Thus, if you knew the value of x, according to the population regression line you could predict that the value of the dependent variable y is b0+b1xi LINEAR REGRESSION: THE ERROR TERM • ui is the error term. The error term incorporates all the other factors BESIDES X that determine the value of the dependent variable y, for a specific observation, i. •The error term is sometimes called the disturbance term. ESTIMATING SIMPLE LINEAR REGRESSION MODELS
Use the method of ordinary least squares Use (OLS) (OLS) OLS involves minimizing the sum of OLS squared residuals (SSR) ESTIMATING SIMPLE LINEAR REGRESSION MODELS (Contd.) For the above model, ordinary least squares involves solving the following minimization problem using calculus ORDINARY LEAST SQUARES ORDINARY LEAST SQUARES (Contd.) Note how the slope can be also be obtained by dividing the covariance between x and y by the variance of x (i.e. the variance of the regressor) WHY OLS?? OLS: PREDICTED VALUES OLS: RESIDUALS OLS: SUMS OF SQUARES Sums of squares are useful for constructing Sums Analysis of Variance Tables for hypothesis testing as well as for assessing goodnessoffit of model as Three types of sums of squares can be computed, Three namely: (1)Total sums of squares (TSS); (2)Explained sum of squares(ESS); and (3) Sum of squared residuals (SSR) of OLS: SUMS OF SQUARES (Contd.) PRESENTATION OF SUMS OF SQUARES(ANOVA TABLE) ASSESSING GOODNESS OF FIT OF LINEAR MODEL STANDARD ERROR OF A REGRESSION (SER) LINEAR REGRESSION EXAMPLE Covariance=5.25; Correlation coefficient=0.98 CAN CONFIRM USING GRETL SCATTERPLOT COMPUTING OLS ESTIMATES: BY HAND COMPUTING OLS ESTIMATES: GRETL OUTPUT Model 1: OLS estimates using the 5 observations 15 Dependent variable: y VARIABLE const x COEFFICIENT 0.260870 0.456522 STDERROR 0.367813 0.0547089 T STAT PVALUE 0.709 8.345 0.52932 0.00361 *** INTERPRETING OLS ESTIMATES COMPUTING PREDICTED VALUES AND RESIDUALS
Predicted values Residuals Confirm (by adding the values in the last column) that the residuals sum to zero as expected COMPUTING THE THREE SUMS OF SQUARES Confirm that: TSS=10 (add up column 9); ESS=9.59 (add up column 7); SSR=0.4133 (add up column 5); Hence, TSS=ESS+SSR ANOVA TABLE GOODNESSOFFIT(RSQUARE)
=9.59/10=0.959 Interpretation: About 96 percent of the variation in y (consumption) is ascribed to variation in x (income). The remaining 4% of variation in y is attributed to other factors or random error. Confirm also that Rsquared is the square of the correlation coefficient between x and y i.e. 0.96 is the square of 0.98 (See slide 16) STANDARD ERROR OF THE REGRESSION
=square root of (9.59/3)=1.79 SER has same units as dependent variable (y=consumption) CALIFORNIA TEST SCORE DATA REVISITED California Test Score Data Set (n=420) See Textbook (Chapter 4) Previously we used the same data to examine the Previously difference between the mean test score for large vs small classes as defined by the Student Teacher Ratios (STRs). We mentioned that if the confidence interval for the difference between means includes zero then the two means are not significantly different. significantly CALIFORNIA TEST SCORE DATA REVISITED: Continued Original policy question: What is the effect on test Original scores of reducing Student Teacher Ratio (STR) by one student/class? Equivalently, what is the change in test score divided by the change in class size? size? This policy question cannot answered by This constructing a confidence interval or testing hypothesis as mentioned in the previous slide hypothesis Fortunately, regression methods can be used to Fortunately, answer this policy question as demonstrated in the following slides. CALIFORNIA TEST SCORE DATA REVISITED: Continued
The change in test score divided by the The change in class size (as measured by STR) is the slope of the regression line relating test score (the regressand) to STR (the regressor) as explained in the following slides. To answer the above policy question, we To need to estimate the slope need LINEAR REGRESSION: ANOTHER EXAMPLE
California Test Score Data Set (n=420) See Textbook (Chapter 4) Model POPULATION REGRESSION LINE AND ERROR TERM LINEAR REGRESSION: ANOTHER EXAMPLE (Contd.) SPSS OUTPUT ESTIMATED MODEL (n=420) THE FITTED (OR ESTIMATED ) REGRESSION LINE VISUALLY INTERPRETATION OF COEFFICIENTS PREDICTED VALUES AND RESIDUALS ASSESSING GOODNESSOFFIT USING RSQUARED STANDARD ERRORS OF COEFFICIENTS CONFIDENCE INTERVAL Suppose a skeptic suggests that reducing the Suppose number of students in a class has no effect on learning or, specifically, test scores. This assertion can be tested statistically in two ways: can (1) By constructing a confidence interval of the (1) slope coefficient slope (2) By testing hypothesis regarding significance of (2) the slope coefficient the BOTH APPROACHES LEAD TO IDENTICAL BOTH CONCLUSIONS CONCLUSIONS CONFIDENCE INTERVAL (Contd.) E.G. CALIFORNIA DATA REPORING REGRESSION RESULTS: CONVENTION HYPOTHESIS TESTING:STEP 1
Specify null hypothesis and alternative hypothesis H0:the null hypothesis; H1:the alternative Hypothesis Choose one of the three HYPOTHESIS TESTING:STEP 2
Choose level of significance (denoted by Choose the Greek letter alpha). For example, alpha=5% the most common. The level of significance is the probability The of type I error (i.e. of rejecting the null hypothesis when the null hypothesis is, in fact, correct) fact, HYPOTHESIS TESTING:STEP 3 Compute the test statistic When t is called the tratio which is reported directly in the SPSS output. HYPOTHESIS TESTING: STEP 4 Decision rules (Determine whether to Decision accept or reject the null hypothesis H0) accept HYPOTHESIS TESTING: STEP 4
Hypothesis testing using pvalues Decision rules: HYPOTHESIS TESTING EXAMPLE (CALIFORNIA DATA) BINARY/DUMMY VARIABLES
Binary/dummy variables take only two Binary/dummy possible values (0 or 1) possible Binary/dummy variables are used in simple Binary/dummy linear regression to test the difference between two means between BINARY VARIABLE: EXAMPLE
Suppose there are two groups (60 men and Suppose 40 women). Are their mean earnings significantly different? significantly Solutions Solutions Method 1: Test for the difference between Method two means using the t test two Method 2: Specify a simple linear Method regression with binary variable regression SOLUTION:METHOD 1 SOLUTION: METHOD 2 GAUSSMARKOV THEOREM
Gauss Markov Theorem states that provided Gauss the standard assumptions are met, the OLS estimator will be BLUE. estimator Proof of some aspects of the GaussMarkov Proof Theorem is given in the appendix to chapter 4 of the textbook of Meaning of BLUE
B means BEST (has smallest variance) L means Linear (linear in the dep. Var.) U means unbiased(mean of sampling means distribution is equal to the population value) distribution E means estimator (estimator is a formula) HETEROSKEDASTICITY
Heteroskedasticity means nonconstant Heteroskedasticity variance variance It is a violation of constant variance (I.e. It homoskedasticity) assumption homoskedasticity) HETEROSKEDASTICITY: CONSEQUENCES
OLS estimator still unbiased (good!) OLS estimator no longer BLUE (bad!) OLS standard errors no longer reliable OLS (bad!). OLS standard errors are referred to as OLS homoskedasticity (i.e. constant variance) only standard errors). OLS standard errors not good if homoskedasticity is violated. not HETEROSKEDASTICITY: SOLUTIONS Method 1: Adjust standard errors for Method presence of heteroscedasticity i.e. Calculate hetetroskedasticityadjusted standard error or heteroskedasticityrobust standard error. Recall that the problem with heteroskedasticity is that the OLS standard errors are no longer reliable. errors HETEROSKEDASTICITY: SOLUTIONS (Continued)
Method 2: Use Weighted Least Squares Method (WLS) instead of Ordinary Least Squares (OLS) (OLS) WLS involves two steps. WLS Step 1: Transform the model to get rid of heteroskedasticity Step 2: Estimate the transformed model by OLS OLS ...
View
Full Document
 Winter '09
 Ogwang
 Econometrics, Linear Regression, Regression Analysis, Contd

Click to edit the document details