LEC5 - Categorical Variable Recoding Morgan C Wang Department of Statistics University of Central Florida 1 Morgan C Wang Outline Introduction

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Morgan C. Wang Department of Statistics University of Central Florida Categorical Variable Recoding 2/28/2011 1 Morgan C. Wang
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline Introduction Representations with Dummy Variables Unary Variable with Missing Values Binary Variable with Missing Values High Level Nominal or Ordinal Variables Nominal Scale Variables with Many Levels Ordinal Scale Variables with Many Levels Periodical Variables Case Study 2/28/2011 Morgan C. Wang 2
Background image of page 2
Introduction 2/28/2011 3 Morgan C. Wang
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction Why categorical variables with many levels can be a problem for methods such as regression and neural network? Transformation Target Dependent Transformation Target Independent Transformation 2/28/2011 Morgan C. Wang 4
Background image of page 4
Representation with Dummy Variables 2/28/2011 5 Morgan C. Wang
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Representation with Dummy Variables Unary Variable with Missing Values Binary Variables with Missing Values Categorical Variables with Many Levels Nominal Scale Ordinal Scale 2/28/2011 Morgan C. Wang 6
Background image of page 6
Unary Variable with Missing Values A unary variable that has missing values can be treated as a binary variable with two categories “Yes” and “Unknown” (to represent the missing category) and then treat this variable as a missing value indicator variable. 2/28/2011 Morgan C. Wang 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Unary Variable with Missing Values 2/28/2011 Morgan C. Wang 8
Background image of page 8
Dummy Variable A dummy variable is a numerical valued variable with the value “1” to indicate the presence of a given level of a categorical variable and with the value “0” to indicate the absence of the given level of a categorical variable. For a categorical variable with n levels, we can use n-1 dummy variables to represent it. 2/28/2011 Morgan C. Wang 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Binary Variable with Missing Values Treat it as one binary variable by imputing the missing values into either the “Yes” or “No” category; Treat it as a nominal scale variable with three categories, and Treat it as an ordinal scale variable that can be represented by two binary variables. 2/28/2011 Morgan C. Wang 10
Background image of page 10
Binary Variable with Missing Values The precise way in which we treat a binary variable with missing values depends on the proportion of missing values, the predictive capability of the binary variable, and the necessity of limiting the dimensionality of the problem. 2/28/2011 Morgan C. Wang 11
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Impute the Missing Value If the proportion of missing values for the binary input variable is limited (say < 5%) and the target binary variable has a consistent frequency for the two levels of the input variable, then imputing either a Yes or a No to the input variable will have near negligible consequences. If the binary target values are overwhelmingly “Yes” (as an example, the same story holds if it were “No”), say in the 99% range, then it is unlikely that imputation would accomplish much.
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/22/2011 for the course STA 6714 taught by Professor Staff during the Spring '11 term at University of Central Florida.

Page1 / 68

LEC5 - Categorical Variable Recoding Morgan C Wang Department of Statistics University of Central Florida 1 Morgan C Wang Outline Introduction

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online