This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Fall 2006 ORIE474: Section 4 Notes Nikolai Blizniouk In this recitation we shall use the DONATIONS data set to investigate the effect of the income level, ownership of a house and level of average amount of donations to date on the amount of donation in the current campaign. Along the way we shall gain experience with creation of new variables in SAS, sampling and filtering the data for outliers. 1. Start SAS and import the (unedited) data set raw.xls with the DONATIONS data that we used during weeks 1 and 2. 2. Create the following nodes: • 1 of Input Data Source , • 1 of Replacement , • 1 of Transform Variables , • 2 of Filter Outliers , • 2 of Data Set Attributes , • 4 of Sampling , • 6 of Insight (for intermediate monitoring of results). 3. Connect the nodes as shown on the figure below: (You don’t have to draw the rectangular boxes using SAS, but if you figure out how, I’ll be curious to know.) 1 4. Let’s specify the data first: (a) Open Input Data Source node and (b) specify the source of the DONATIONS data set. (c) Notice that the size of the full data set is 2988 observations (rows). By default SAS will take a random “metadata” sample of 2000 observations (or some other number of observations that you specify), in order to save computations for large data sets when the model has not been fully developed yet. In the Data tab, press Change button and check the Use complete data as sample box. This tells SAS to use the full data set. (d) In the Variables tab, make sure that TARGET B is not rejected from the model. (e) Close the Input Data Source node saving changes. 5. It is usually not a good idea to make irreversible changes to your original data. The Data Set Attributes node allows to modify characteristics of your variables without changing the original data set. 6. Recall that INCOME is the ordinal variable for the income level, with 7 possible levels. The implication is that we cannot do arithmetic operations unless its measurement type is converted to the interval measurement type. In addition, the variable has missing values that we have to deal with. 7. Let’s create the indicator variable INCOME BIN that will aggregate low-income and high-income respondents so that INCOME BIN=0 if INCOME ≤ 3 and and INCOME BIN=1 if INCOME > 3 ....
View Full Document
- Spring '07
- Filter Outliers node, Sampling node