This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Relationships Between Relationships Between Categorical Variables Today’s topic Today’s topic
We have discussed relationships between measurement variables (e.g. GPA and SAT) What about relationships between categorical variables? e.g. Party identification (Democrat, Independent, Republican, Other) & Vote choice (Bush, Kerry, Other) Contingency tables show the relationships Contingency tables between two variables. Discrimination Experiment Discrimination Experiment Source: Page 1998
Positive Label No Label Total 22 63 85 Negative 68 27 95 Total 90 90 180 • Data are displayed in cells • The right hand column and the bottom row display marginal totals • The lower right hand cell displays the grand total • Columns usually (but not always) used for outcome variable Does sexual orientation affect whether or Does sexual orientation affect whether or not someone received a positive response? Hard to tell from the raw numbers. Row Percentages Row Percentages • E.g. what percentage of cases in the “label” category received a positive response? What percentage of cases in the “no label” category received a positive response? Contingency table Contingency table Positive Label No Label Total 22 63 85 Negative 68 27 95 Total 90 90 180 22/90=24% 68/90=76% 63/90=70% 27/90=30% A simpler way to display the data: A simpler way to display the data:
Percent Positive Label No Label 24% 70% Total N 90 90 180 Notes Notes
• In the previous example, because there are two mutually exclusive categories, it is simpler to display just one column be better to display all 3 • But with 3 or more outcome categories, it might • It is important to make sure you are taking row, and not column, percentages if your outcome variables are in the columns Some useful terms Some useful terms
• Percentage with trait = • Proportion with trait = (number with trait / total)*100% number with trait / total number with trait / total • Probability of having the trait = A proposal has been made…to require the US government to bring A proposal has been made…to require the US government to bring home all US troops before the end of the year. Would you like to have your congressman vote for or against this proposal? Grade High school school ed. education
% for withdrawal % against withdrawal College Total education adults 73% 27% 100% 100% 100% Try to fill in the table… Try to fill in the table…
• About a third fall into each education category; so must roughly have a mean of the row percentage • Columns must sum to 100% • Sample of adults • The education levels are highest level of education achieved A proposal has been made…to require the US government to bring home all US troops before the end of the year. Would you like to have your congressman vote for or against this proposal? % for withdrawal % against withdrawal Grade school ed. 80% 20% 100% High College school education education 75% 60% 25% 100% 40% 100% Total adults 73% 27% Most people think opposition to the war Most people think opposition to the war increased with educational attainment Testing Hypotheses Testing Hypotheses
• After we display the data, we can test a hypothesis about the relationship between 2 categorical variables • Does the relationship in our sample hold for the population? • We will talk about relationships between 2 variables each with 2 outcomes (e.g. male and female, drink before driving or not) 2 x 2 (Although it is possible to test, say 3 X 3, or 3 X 2, etc, we will not cover this) Statistical Significance Statistical Significance
If a relationship is statistically significant, the relationship observed in the sample is unlikely to have occurred unless there was a real relationship in the population Example Example
• The Supreme court considered a case involving an OK law allowed the sale of 3.2% beer to females under 21 but not males. (see case study 6.3) • Collect a roadside SAMPLE of young drivers. – 481 males in the survey; 77 (16%) had been drinking in the past 2 hrs. – 138 females in the survey; 16(11.6%) had been drinking in the past 2 hrs. • Is there a difference in the POPULATION? Basic Steps in Hypothesis Testing Basic Steps in Hypothesis Testing
1. Determine null & alternative hypotheses 2. Collect data & summarize in a test statistic 3. Determine how unlikely to test statistic would be if the null hypothesis were true 4. Make a decision Null & Alternative Hypotheses Null & Alternative Hypotheses
• Null hypothesis: There is no relationship • Alternative hypothesis: There is a between the 2 variables in the population relationship between the 2 variables in the population Hypotheses Example Hypotheses Example
• Null hypothesis: Males and females in the population are equally likely to drive within 2 hours of drinking alcohol • Alternative hypothesis: Males and females in the population are not equally likely to drive within 2 hours of drinking alcohol Collect Data Collect Data
• Next, collect the data. Display in a contingency table. • We don’t need percentages at this point (although if we are presenting the table for a reader, it would be nice) Drinking While Driving & Gender Drinking While Driving & Gender
Yes Male Female Total 77 16 93 No 404 122 526 Total 481 138 619 Test Statistic Test Statistic
• • • •
Now we will calculate a test statistic Test statistics differ for the type of data used We are looking at categorical data in a 2 by 2 table We will use the chi squared statistic, which is appropriate for testing for a relationship between two categorical variables Expected counts Expected counts
• The first step is to calculate the expected counts for each cell • For each cell, multiply the column total by the row total & divide by the overall total number of cases Yes Male Female Total No Total 481*93/619 481*526/619 481 138*93/619 138*526/619 138 93 526 619 Expected counts are the lower number in each cell… Expected counts are the lower number in each cell… Yes Male Female Total 77 72.27 16 20.73 93 No 404 408.73 122 117.27 526 Total 481 138 619 Chisquared statistic Chisquared statistic
• For each cell, find:
(Observed countexpected count)2 expected count • Where the observed count are the data that we collected (it is the top number in each cell in the previous slide) • Then take the sum of these values • This sum is the chi squared statistic (7772.27)^2/72.27 for first cell=.31; do this for each cell & add the (7772.27)^2/72.27 for first cell=.31; do this for each cell & add the results chi sq.= .31 + (404408.73)^2/408.73 + (1620.73)^2/20.73 +(122 117.27)^2/117.27 Chi sq.=.31 + .055 + 1.081 + .191 = 1.637 Yes Male Female Total 77 72.27 16 20.73 93 No 404 408.73 122 117.27 526 Total 481 138 619 Make a Decision Make a Decision
• Do we reject the null hypothesis or not? • We use the test statistic to determine the p
value, or how unlikely the sample values would be if the null hypothesis were true null hypothesis (that is, there IS a relationship in the population)
– This .05 value is arbitrary • Usually, if the pvalue is below .05, we reject the How to find a pvalue? How to find a pvalue?
1. Calculate chi square statistic 2. Degrees of freedom – Df=(#rows1)*(# of columns1) – If 2 by 2 table, df=(21)*(21)=1 3. Use excel to calculate the pvalue In excel: CHIDIST(x, df) Rule of Thumb Rule of Thumb
• For a df of 1 and pvalue of .05:
– – if our chi squared statistic is over 3.84, reject the null If under 3.84, do not reject the null • Why 3.84? Because the pvalue for 3.84 with 1 degree of freedom is about .05 value=.05 • Note that this rule only works for df=1 and p Chi Squared Applet Chi Squared Applet
• http://
math.hws.edu/javamath/ryan/ChiSquare.html Returning to our example… Returning to our example…
• Our chisquared statistic is 1.637 • Using the rule of thumb, 1.637 is below 3.84 • We do not reject the null • That is, we do not detect a statistically significant • We were able to determine this from a sample
difference in the population between males and females • Questions? ...
View Full
Document
 Fall '09
 TAMBORINI

Click to edit the document details