OR474wk10.2 - Clustering Using SAS EM October 6 2003 The...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Clustering Using SAS EM October 6, 2003 The Data and Task The data in prospect.xls is demographic information from a company’s database on 4701 of their customers. (Names and addresses have been removed.) Preferences for different products may depend on these demographic factors, so the company would like to segment these customers based on demographic similarities as a prelude to test-marketing. The segmentation is to be achieved using a form of clustering. The data set is in Excel spreadsheet format. Each row represents a customer and each column a particular demographic variable. Here are what the variables mean: Variable Name Role Type Description ID ID Nominal Customer identification number AGE Input Interval Age (years) INCOME Input Interval Annual income (thousands of dollars) SEX Input Binary F = female, M = male MARRIED Input Binary Indicator of marriage OWNHOME Input Binary Indicator of personal home ownership LOC Rejected Nominal Residence location code (A–H) CLIMATE Input Nominal Residence climate code (10, 20, 30) FICO > =700 Input Binary Indicator of FICO score > = 700 Unlike in previous analyses, there is no “Target” variable — clustering is an unsupervised learning task. The variable LOC gives a code for the region in which a customer resides. Such a variable could be used for clustering, but it has 8 levels (a relatively large number), which could make interpretation difficult. Understanding of the product types and the market suggested that only the climate of a region should influence the preferences of a customer in that region. Accordingly, these regions were further grouped into 3 climate zones, conveniently coded in the variable CLIMATE. FICO is an acronym for a numerical credit scoring system (devised by Fair, Isaac & Company, Inc.) used by banks and other lenders to measure the creditworthiness of an individual. As a rule of thumb, individuals who have a score of at least 700 are considered to have good credit, while those whose score is below 700 will be treated more cautiously by prospective lenders. For convenience, this analysis will use only a binary variable indicating whether the FICO score is at least 700. The Procedure 1. Download and save prospect.xls . Import the data into SAS, start Enterprise Miner, and create a new project. Maximize the EM window. This will make it easier to view the charts and diagrams that we will create....
View Full Document

Page1 / 4

OR474wk10.2 - Clustering Using SAS EM October 6 2003 The...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online