Ch. 3 – Intro to Correlation and Regression
Ch. 2 deals with
univariate
data.
This chapter, however, considers
bivariate
data and
how two
numerical
variables are related.
Methods of description are introduced here and
formalized in Ch. 11.
Terminology
:
x
y
Explanatory variable
Response variable
Independent variable
Dependent variable
Predictor variable
Predicted variable
Notation
:
 bivariate sample of size
n
: { (
x
1
,
y
1
), (
x
2
,
y
2
), …, (
x
n
,
y
n
) }
 sample means:
x
,
y
 sample std dev.:
s
x
,
s
y
Displaying relationships
:
Def’n: An association
exists between two variables if a particular value for one variable
is more likely to occur with certain values of the other variable.
A scatterplot
is a graphical display of two quantitative variables.

x
variable goes on the
x
axis,
y
variable on the
y
axis
 origin (0,0) may be included
Look for
:  form of relationship (i.e. any obvious pattern)
 strength of relationship (i.e. closeness of fitting to a line)
 direction of relationship (i.e. positive or negative association)
 any unusual observations or outliers
x
y
1
1
2
2
4
1
3
2
(graph of above data used to discuss scatterplot traits further)
Correlation
:
Def’n: Pearson’s Sample Correlation Coefficient
r
is given by
r
∑
∑
−
=
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
−
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
−
−
=
=
i
i
y
x
n
i
y
i
x
i
z
z
n
s
y
y
s
x
x
n
1
1
1
1
1
where
i
x
z
is the “standardized” observation for
x
i
and
i
y
z
is the “standardized”
observation for
y
i
for
i
= 1, …,
n
(example graphs of correlation drawn in class: 1. strong positive linear; 2. weak positive
linear; 3. strong negative linear; 4. no pattern; 5. parabola; 6. exponential)
r
:
•
This note was uploaded on 07/31/2011 for the course STAT 151 taught by Professor Henrykkolacz during the Winter '07 term at University of Alberta.
 Winter '07
 HenrykKolacz
 Correlation

