This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Creating a SAS EM Decision Tree for Classification Data We will use the data set hmeq.xls that contains baseline and loan performance information for 5,960 recent home equity loans. The target (BAD) is a binary variable that indicates if an applicant eventually defaulted or was seriously delinquent. This adverse outcome occurred in 1,189 cases (20%). For each applicant 12 input variables were recorded. We want a credit scoring model that computes a probability of a given loan applicant defaulting on loan repayment. A threshold is selected such that all applicants whose probability of default is in excess of the threshold are recommended for rejection. Procedure 1. Download and save hmeq.xls from the course website. Import the data into SAS, start Enterprise Miner, and create a new project. Maximize the EM window. This will make it easier to view the charts and diagrams that we will create. 2. Drag an Input Data Source node onto the diagram, open it, and input the data set. Set the Model Role of BAD to target . 3. Connect a Data Partition node after the Input Data Source node and open it. Under the Partition tab, reset the Percentages for the data partitioning so that the Train data set gets 67%, the Validation set gets 33%, and the Test set gets 0%. This partitioning is chosen to devote more data to training than we used in previous recitations. Tree models can be more flexible than regression models, but more data may be needed to make the fit both stable and useful. Close this node (choosing to save changes). If we were fitting a logistic regression model, we would probably add a Replacement node at this point (since the data contains missing values) and possibly also a Transform Variables node. However, we wont use either of these nodes here because a tree model does not need them. 4. Connect a Tree node after the Data Partition node and open it. You will see the same Data , Variables and Notes tabs that you have seen before when opening a Regression node. The other tabs are different. In particular, the Basic and Advanced tabs allow you to specify parameters that are specific to building decision trees.you to specify parameters that are specific to building decision trees....
View Full Document
- Spring '07