MVChap02 - STA 4107/5107 Chapter 2 Examining Your Data 1...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STA 4107/5107 Chapter 2 Examining Your Data 1 Key Terms Please learn these terms on your own. Ask questions in class or come to office hours if you need help. 2 Getting the Data into SAS There are various options for getting data into SAS. We will learn code for two: one for inputting text files and the other for inputting Microsoft Excel files. You can also use the Import Data option under the File menu and follow the prompts. The disadvantage to the latter is that you will lose that import when you close SAS. If you write code, you simply run the input portion of the code everytime and the data in ready to use. 2.1 Inputting Text Files The following code inputs the text data file called salary.txt into SAS. The first line creates a SAS dataset and names it salary. The second line tells SAS where the text file resides. You must be sure to have the path correct. The third line names the columns in the text file. The dollar signs tell SAS that the variable is non-metric. data salary; infile C:\UFL\5106\salary.txt; input job$ id gender$ yrstart$ salary; run; 3 Graphical Examination of Data Preliminary and exploratory data analysis is a must for any kind of data analysis. It becomes difficult when the data exist in dimensions higher than the human mind can visualize. The following techniques are some that have been developed for aiding the researcher in this endeavor. 3.1 The Nature of the Variable: examining the shape of the distribution As with univariate statistical procedures, getting a sense for the overall shape of the dis- tribution is necessary to validate model assumptions. Many, though not all, multivariate techniques assume that the data (or the residuals) are normally distributed. Plotting his- tograms to get an idea of the shape is probably the single most important thing one can do 1 to assess normality. Histograms are easy when the number of variables is not unreasonably large. Histograms clearly will not work for variables that are not metric. If there are nominal data variables in your data, it makes sense to plot histogram by category. 3.2 SAS code for Histograms The following code produces a histogram by gender of salary in a mythical company. Try it by downloading the text file salary.txt and editing the code so that the path is correct for your computer. It is necessary to sort the code yourself first, so that SAS can find the categories. PROC PRINT simply prints to the screen so you can check to see that the data look as they should. It is not necessary to do a PROC PRINT after a procedure because SAS automatically prints the output from a procedure to the output screen. data salary; infile C:\UFL\5701\salary.txt; input job$ id gender$ yrstart$ salary; run; proc univariate data=salary; histogram salary; run; proc sort data=salary; by gender; run; proc univariate data=salary; by gender; histogram salary; run; 2 If the analysis procedure re- quires normality, then we are looking for smooth, approximately...
View Full Document

This note was uploaded on 07/14/2011 for the course STA 4702 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 24

MVChap02 - STA 4107/5107 Chapter 2 Examining Your Data 1...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online