1
1
Handout 4.Descriptive study of bivariate data
Categorical data (Chapter 3, section 2).
Numerical data (Chapter 3, sections 4, 5, 6).
Bivariate data

For each unit in the sample, measure two
values.
Example :
Data were collected to measure the effect of the body
weight on the blood pressure in individuals aged between
15 and 30.
Sample unit: individual aged between 15 and 30.
Variables measured in each sample unit: body weight and
blood pressure.
2
Both
discrete
Both
numerical
Bivariate
data
Discrete and
numerical
Do not consider
1.
Described by a
contingency table
2.
Summaries via
frequencies
1.
Described by a scatter
plot
2.
Summaries via
correlation coefficient,
regression line
3
Example 4.1
• The manager of a company wants to investigate the
association between type of defects found on furniture
and the production shift.
• A sample of 309 furniture defects produced the following
contingency table
Type of defect
20
5
13
D
49
17
33
3
34
31
26
2
45
21
15
1
C
B
A
Shift
15 defects A
produced in
shift 1
4
Example 4.1 cont’d
• To analyze the distribution of frequency between the two
categorical variables, it is best to complete the table by
the row and column totals, from which one can compute
the row/column conditional frequency distributions or the
joint frequency distribution.
309
38
128
69
74
(mar
ginal)
Total
119
96
94
(marginal)
Total
Type of defect
20
5
13
D
49
17
33
3
34
31
26
2
45
21
15
1
C
B
A
Shift
74 defects
A produced
in all shifts
96 shifts of
type 2
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document2
5
Relative frequencies
• Divide each cell frequency by the total frequency, to get
the relative frequencies.
1.00
0.12
0.42
0.22
0.24
Total
0.39
0.30
0.31
Tot
al
Type of defect
0.07
0.01
0.04
D
0.16
0.05
0.11
3
0.11
0.10
0.08
2
0.15
0.07
0.05
1
C
B
A
Shift
Which is the most common type of defect?
Which is the most common combination of shift/type of defect?
Which is the most common type of defect in the first shift?
6
Row relative frequencies
• Divide each cell frequency by the row total, to get the
relative frequencies per row. These numbers are now
comparable across rows.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 palmero
 Linear Algebra, Vector Space, Least Squares, Regression Analysis, sample unit

Click to edit the document details