CATEGORICAL (FINISH) NORMAL (CH. 3.1) STATISTICS 10 - LECTURE 5

SOME DATA - FROM THE US GOVERNMENT HOME MORTGAGE DISCLOSURE ACT - ALLOWS THE INSPECTION OF LOAN PRACTICES
READING DATA AND EXAMINING ITS STRUCTURE read.csv( ), names( ) and str( ) are all useful for new datasets

TABLING A SINGLE VARIABLE THESE TABLES IDENTIFY THE VALUES , COUNTS & PROPORTIONS WITHIN A CATEGORICAL
Contingency Tables While tabling a single categorical variable can be interesting… Creating tables involving two categorical variables (called Contingency Tables or Two Way Tables) allow us to examine the relationships between variables

Two variables: Race/Ethnicity & Loan Approval We can now study relationships The question here is…what is the relationship between an applicant’s race and loan approval? We can see that Non Hispanic Whites have the most approvals but they also have the most disapprovals. They are the largest group
SOLUTION: ADJUST BY COMPUTING ROW PROPORTIONS “WITHIN EACH GROUP, WHAT PROPORTION GOT THEIR HOME LOAN APPLICATIONS APPROVED?”

COMPARE: ADJUST BY COMPUTING COLUMN PROPORTIONS IT ANSWERS A SLIGHTLY DIFFERENT QUESTION: GIVEN APPROVAL, WHAT WAS THE DISTRIBUTION OF RACE
THE MOSAIC PLOT - A GRAPHICAL CONTINGENCY TABLE CORRESPONDS TO THE PROPORTION OF LOANS OF EACH TYPE BY BANK

BOXPLOTS USE NUMERICAL & CATEGORICAL VARIABLES THE COMPARISON IS ACROSS GROUPS (RACE/ETHNICITY). WHAT CAN YOU SAY
AN UNINFORMATIVE BOXPLOT - INCOME & RACE EXTREME VALUES ARE “SQUASHING” OUR BOXES

IMPROVED BY LOG TRANSFORMATION OF INCOME A CLEARER PICTURE OF THE RELATIONSHIP IS REVEALED WITH A TRANSFORMATION
COMBINING NUMERICAL AND CATEGORICAL HISTOGRAM EXAMPLE

Take Aways 1.7 We only have a few tools for categorical variables - table, contingency table, mosaic plot What is more interesting is to examine numerical variables and categorical variables together (box plot is one way, there are others)
CHAPTER 3 - DISTRIBUTIONS

