This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Regression by Cluster Using SAS EM The Data and Task The data file orthopedic.xls for this recitation can be found on the course website. This file contains information compiled by a company that sells orthopedic devices to hospitals. The observations are 4703 different U.S. hospitals, and the variables are listed in the table below. The company would like to use demographic information about the hospitals to predict sales. It might be useful to first cluster the hospitals into different market segments and then run separate regressions on selected segments. Here are what the variables mean: Variable Name Role Type Description ZIP Rejected Nominal US postal code HID ID Nominal Hospital ID CITY Rejected Nominal City name STATE Rejected Nominal State name BEDS Input Interval Number of hospital beds RBEDS Input Interval Number of rehab beds OUT-V Input Interval Number of outpatient visits ADM Input Interval Administrative cost (thousands of $s per year) SIR Input Interval Revenue from inpatient SALESY Rejected Interval Sales of rehab. equipment since Jan. 1 SALES12 Target Interval Sales of rehab. equip. for last 12 months HIP95 Input Interval Number of hip operations for 1995 KNEE95 Input Interval Number of knee operations for 1995 TH Input Binary Indicator of teaching hospital TRAUMA Input Binary Indicator of having a trauma unit REHAB Input Binary Indicator of having a rehab unit HIP96 Input Interval Number of hip operations for 1996 KNEE96 Input Interval Number of knee operations for 1996 FEMUR96 Input Interval Number of femur operations for 1996 The designations of Role and Type are specifically meant for the analysis in the following sections, and are somewhat arbitrary. For instance, either SALESY or SALES12 could be used as the target variable, but only SALES12 will be used here (and naturally SALESY will be rejected). The information contained in the CITY, STATE, and ZIP variables could, in principle, be used in the analysis. However, these variables have too many distinct nominal values to be useful in automatic clustering or regression, so they will be rejected. If a variable that grouped hospitals by region were also provided, it could be used instead. The following procedure will demonstrate one way of performing a by-cluster regression in SAS EM. This will basically involve applying clustering to the input variables, adding cluster membership as a new input variable, and finally performing an appropriate regression on all variables and their interactions with cluster membership. The results will be compared with 1 performing a standard unclustered regression....
View Full Document
This note was uploaded on 02/06/2011 for the course ORIE 474 taught by Professor Apanasovich during the Spring '07 term at Cornell University (Engineering School).
- Spring '07