Multiple Regression Analysis: Inference • Statistical inference in the regression model • Hypothesis tests about population parameters • Construction of confidence intervals • Sampling distributions of the OLS estimators • The OLS estimators are random variables • We already know their expected values and their variances • However, for hypothesis tests we need to know their distribution • In order to derive their distribution we need additional assumptions • Assumption about distribution of errors: normal distribution 1
Multiple Regression Analysis: Inference • Assumption MLR.6 (Normality of error terms) independently of It is assumed that the unobserved factors are normally distributed around the population regression function. The form and the variance of the distribution does not depend on any of the explanatory variables. It follows that: 2
Multiple Regression Analysis: Inference • Discussion of the normality assumption • The error term is the sum of “many” different unobserved factors • Sums of independent factors are normally distributed (CLT) • Problems: • How many different factors? Number large enough? • Possibly very heterogenuous distributions of individual factors • How independent are the different factors? • The normality of the error term is an empirical question • At least the error distribution should be “close” to normal • In many cases, normality is questionable or impossible by definition 3
Multiple Regression Analysis: Inference • Discussion of the normality assumption (cont.) • Examples where normality cannot hold: • Wages (nonnegative; also: minimum wage) • Number of arrests (takes on a small number of integer values) • Unemployment (indicator variable, takes on only 1 or 0) • In some cases, normality can be achieved through transformations of the dependent variable (e.g. use log(wage) instead of wage) • Under normality, OLS is the best (even nonlinear) unbiased estimator • Important: For the purposes of statistical inference, the assumption of normality can be replaced by a large sample size 4
Multiple Regression Analysis: Inference • Terminology • Theorem 4.1 (Normal sampling distributions) Under assumptions MLR.1 – MLR.6: The estimators are normally distributed around the true parameters with the variance that was derived earlier The standardized estimators follow a standard normal distribution “Gauss-Markov assumptions” “Classical linear model (CLM) assumptions” 5
Multiple Regression Analysis: Inference • Testing hypotheses about a single population parameter • Theorem 4.2 (t-distribution for the standardized estimators) • Null hypothesis (for more general hypotheses, see below) Under assumptions MLR.1 – MLR.6: If the standardization is done using the estimated standard deviation (= standard error), the normal distribution is replaced by a t-distribution The population parameter is equal to zero, i.e. after controlling for the other independent variables, there is no effect of x j on y Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.
You've reached the end of your free preview.
Want to read all 28 pages?