# Chapter 6 - Solver Step by Step.docx - Chapter 6 u2013...

• 5

This preview shows page 1 - 3 out of 5 pages.

Chapter 6 – Hands on Classification Example using the Insurance Fraud Sample File Open the file Observations o 2 sheets (1) Past Claims, (2) New Claims o Check what do we know by visual observation? At Fault % = How much the driver shared the blame? How old the vehicle is? Does the policy cover rental cars? Comprehensive or not? Degree of Complexity (possibly a rating from the Insurance Adjuster) o What variables do we NOT know about (or have no explanation for)? Note that is can be common in the real world of external data Concept 1 – 6 each represent some kind of measure – Possibly: Age of driver How early they pay their premium Index of past driving history Etc. o Although the more we know about out data the better equipped we are at setting up our model – we can still run through creating a model even if we are “blind” to some of the variables o Then our LAST COLUMN (Fraud) measures our outcome or result (all the sequence of previous columns combine together to determine whether a claim is fraud or not) Our Past Claims sheet accessed from a large database allows us to see characteristics of fraudulent or non fraudulent claims o With Solver we build a model using the data in our Past Claims and apply our model to New Claims to predict Fraud or No Fraud (note that the fraud column is blank at first) If a claim is likely to be fraudulent then as a company we rationalize that it is worth spending the money to investigate a claim (otherwise not) Note that the process of building our model is very similar no matter what data is being input into the model Go to the Data Mining tab Classify Choose a model (note that we don’t need to know how they are programmed in the background, our objective is to know how to set them up and read the results) Classification Tree (first because it is visually most understandable) Generic Solver interface pops up where you tell it what data to look at (usually grabs the data sheet in total) Choose Selected Variables you want to include as input to the model Starting point can be choose all variables Note that if variables such as house address or phone numbers were here we would not include them Output variable = Fraud (what we are predicting) Solver goes through each row and analyzes percent probabilities based on the Past Claims and determines the probability of fraud for the New Claims
Note the “ Success Probability Cutoff” set at .5 (if we run our model and analyze the results determining we have too many cases flagged as fraud that are not fraudulent or we have too many cases flagged as not fraud that are fraudulent then we may have to adjust this probability percentage Next Here we do the setup of how our algorithm is going to run which differs for