This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: V. Regression Diagnostics Regression analysis assumes a random sample of independent observations on the same individuals (i.e. units). What are its other basic assumptions? They all concern the residuals ( e ): (1) The mean of the probability distribution of e over all possible samples is 0: i.e. the mean of e does not vary with the levels of x . (2) The variance of the probability distribution of e is constant for all levels of x : i.e. the variance of the residuals does not vary with the levels of x . (3) The errors associated with any two different y observations are 0: i.e. the errors are uncorrelatedthe errors associated with one value of y have no effect on the errors associated with other y values. (4) The probability distribution of e is normal. The assumptions are commonly summarized as I.I.D.: independent & identically distributed residuals. To the degree that these assumptions are confirmed, then the relationship between the outcome variable & independent variables is adequately linear. What are the implications of these assumptions? Assumption 1: ensures that the regression coefficients are unbiased. Assumptions 2 & 3: ensure that the standard errors are unbiased & are the lowest possible, making pvalues & significance tests trustworthy. Assumption 4: ensures the validity of confidence intervals & pvalues. Assumption 4 is by far the least important: even if the distribution of a regession models residuals depart from approximate normality, the central limit theorem makes us generally confident that confidence intervals & pvalues will be trustworthy approximations if the sample is at least 100200. Problems with assumption 4, however, may be indicative of problems with the other, crucial assumptions: when might these be violated to the degree that the findings are highly biased & unreliable? Serious violations of assumption 1 result from anything that causes serious bias of the coefficients: examples? Violations of assumption 2 occur, e.g., when variance of income increases as the value of income increases, or when variance of body weight increases as body weight itself increases: this pattern is commonplace. Violations of assumption 3 occur as a result of clustered observations or time series observations: variance is not independent from one observation to another but rather is correlated. E.g., in a cluster sample of individuals from neighborhoods, schools, or households, the individuals within any such unit tend to be significantly more homogeneous than are individuals in the wider sample. Ditto for panel or time series observations. In the real world, the linear model is usually no better than an approximation, & violations to one extent or another are the norm....
View
Full
Document
This note was uploaded on 07/11/2011 for the course SYA 6306 taught by Professor Tardanico during the Spring '09 term at FIU.
 Spring '09
 Tardanico

Click to edit the document details