Unformatted text preview: enter a higher weight in the
(1, 1) cell of the decision table. Select inverse priors to bias the model to identify both true positive and true
negative predictions at the risk of incorrectly estimating false positive and false negative predictions. Figure 5. Decisions and Priors Model Options
The SAS Rapid Predictive Modeler automatically generates a standard report including lift, receiver-operator
characteristic (ROC) charts, and a scorecard. You select additional reports in the Report panel; these include model
summarization, variable rankings, crosstabulations, classification matrix, and detailed fit statistics. Reports are
reviewed in the next section.
Once you are satisfied with your model settings, click Run to execute the task. Figure 6 shows the SAS Rapid
Predictive Modeler task executing within a SAS Enterprise Guide process flow. You can modify the task to select a
different candidate set of input variables, modeling method, or reports items (or any combination) and rerun it. You
can also add more than one SAS Rapid Predictive Modeler task to your SAS Enterprise Guide process flow. 6 SAS Global Forum 2010 Customer Intelligence Figure 6. SAS Rapid Predictive Modeler Task Running within a SAS Enterprise Guide Process
REVIEWING THE MODEL RESULTS
Business analysts and statisticians often spend a significant amount of time compiling model results for review with
their coworkers. The SAS Rapid Predictive Modeler includes a concise set of core reports for reviewing what data
source and variables were used for modeling, a ranking of the important predictor variables, several fit statistics for
evaluating the accuracy of the model, and a model scorecard. The SAS Rapid Predictive Modeler automatically
selects the best fitting model and displays the report. For brevity, only a subset of the reports is presented here.
Two sets of statistics are reported: training and validation. The SAS Rapid Predictive Modeler process divides the
input data into training data and validation data. This partitioning is carefully controlled to make sure that the division
results in two data sets that accurately represent the population. The training data are used to compute the
parameters for each model, resulting in the training fit statsitics. The validation data are then scored with each model,
resulting in the validation fit statistics. The validation fit statistics are used to compare models and detect overfitting. If
the training statistics are signficiantly better than the validation statistics, then you would suspect overfitting, which is
when the model is trained to detect random signals in the data. Models with the best validation statistics are generally
preferred unless you have some reason to believe that either the model was biased by the validation data or that the
validation data are not representative as in the case of a rapidly changing population. In this case, you continue with
a model selected based on validation data statistics.
Figure 7 sho...
View Full Document