This preview shows pages 1–5. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STAT 512: Applied Regression Analysis Topic 2 Fall 2009 Chapter 3: Diagnostics and Remedial Measures Diagnostics : look at the data to diagnose situations where the assumptions of our model are violated. Remedies : changes in analytic strategy to x these problems. What do we need to check? Linearity Assumption : Does the linearity assumption between E ( Y ) and X make sense? Error Assumption : Are errors independent, normal random variables with common variance? Outlier Detection : The term outlier refers to a response that is vastly di erent from other responses (KNNL page 108). Predictor Range : Are there outlying values for the predictor variables that could unduly in uence the regression model? How to get started? Look at the data. Look at the data. Look at the data. Before trying to describe the relationship between a response variable Y and an explanatory variable X , we should look at the distributions of these variables. We should always look at X as well as Y since if Y depends on X , looking at Y alone may not be very informative. 1 Section 3.1: Diagnostics for Predictor Variable We do not make any speci c assumptions about X . However, understanding what is going on with X is necessary to interpreting what is going on with Y . So we look at some basic summaries of the X variables to get oriented. However, we are not checking our assumptions at this point. If X has many values, use proc univariate . If X has only a few values, use proc freq or the freq option in proc univariate . Examine the distribution of X . Is it skewed? Are there outliers? Important statistics to consider include mean, standard deviation, median, mode, and range. Box plots and Stemandleaf plots are useful. Do the values of X depend on time (order in which the data were collected)? ( Sequence plots ) Example (Toluca Company) See program nknw096.sas for the code used to execute proc univariate . /* File: nknw096.sas */ *reads in external file. A new variable seq is created which represents the observation or sequence number; data toluca; infile 'ch01ta01.dat'; input lotsize workhrs; seq=_n_; proc print data=toluca; run; /* File: nknw096.sas */ *generates summary statistics and a qqplot for both x and y.; proc univariate data=toluca plot; var lotsize workhrs; run; 2 /* File: nknw096.sas */ *generates a plot of Y based on sequence order; title1 'Sequence plot for X'; symbol1 v=circle i=l; proc gplot data=toluca; plot lotsize*seq; run; 3 Normal distributions Our model does not state that X comes from a single normal population. Nor must Y come from a single normal population (We assumed the residuals are from a single normal distribution, not Y 's). In some cases, X and/or Y may be normal, or their distributions may be unusual and it can be useful to know this....
View Full
Document
 Fall '10
 Yen

Click to edit the document details