This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 1 1 Handout 4.Descriptive study of bivariate data Categorical data (Chapter 3, section 2). Numerical data (Chapter 3, sections 4, 5, 6). Bivariate data- For each unit in the sample, measure two values. Example : Data were collected to measure the effect of the body weight on the blood pressure in individuals aged between 15 and 30. Sample unit: individual aged between 15 and 30. Variables measured in each sample unit: body weight and blood pressure. 2 When we analyze data on two variables, our first step is to distinguish between the response variable and the explanatory variable. The response variable is the outcome variable on which comparisons are made. The explanatory variable defines the groups to be compared with respect to values on the response variables. Example 1: Is smoking actually beneficial to your health? No , but one study found conflicting evidence. 3 Example1:This study conducted a survey , of 1314 women in the United Kingdom (1972- 1974), asking each woman whether she was a smoker. Twenty years later, a follow –up survey observed whether each woman was deceased or still alive. During that period,24% of the smokers died and 31% of the nonsmoker died. Is smoking actually beneficial to your health?-Survival status (whether a woman is alive after 20 years) is the response variable.-Smoking status is the explanatory variable. 4 • Smoker Dead Alive Total Yes 139 443 582 No 230 502 732 Total 369 945 1314 2 5 Some studies regard either or both variables are response variables. • An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. • Example: What’s the relationship between the daily amount of gasoline use by automobiles and the amount of air pollution? 6 Both categorical Both numerical Bivariate data categorical and numerical We will not consider 1. Described by a contingency table 2. Summaries via frequencies 1. Described by a scatter plot 2. Summaries via correlation coefficient, regression line 7 Example 4.1 • The manager of a company wants to investigate the association between type of defects found on furniture and the production shift. • A sample of 309 furniture defects produced the following contingency table Type of defect 20 5 13 D 49 17 33 3 34 31 26 2 45 21 15 1 C B A Shift 15 defects A produced in shift 1 8 Example 4.1 cont’d • To analyze the distribution of frequency between the two categorical variables, it is best to complete the table by the row and column totals, from which one can compute the row/column conditional frequency distributions or the joint frequency distribution.joint frequency distribution....
View Full Document
This note was uploaded on 04/09/2008 for the course STAT 240 taught by Professor Jeneralczuk during the Spring '08 term at UMass (Amherst).
- Spring '08