The correlation, r is a pure, unit-less number. 17 / 29
Interchange Variables
r = the average of the products of the data in standard units
= 1 n n X i =1 x i - ¯ x sd x × y i - ¯ y sd y = 1 n n X i =1 y i - ¯ y sd y × x i - ¯ x sd x
Another way: the correlation is all about the linear clustering of points.
The correlation between x and y is the same as the correlation between y and x
18 / 29
Change of Location
Recall that if each entry in a list gets increased by a number, b , the mean gets increased by b , and the sd does not change
new x i = old x i + b
new ¯ x = old ¯ x + b
new sd x = old sd x
Applies exactly the same to y
new x i in s.u. = new x i - new ¯ x new sd x = (old x i + b ) - ( old ¯ x + b ) old sd x = old x i - old ¯ x old sd x = old x i in s.u.
Values in s.u. don't change, correlation doesn't change
19 / 29
Change of Scale
Recall that if each entry in a list gets multiplied by a number, d , the mean gets multiplied by d , and the sd gets multiplied by | d |
new x i = old x i × d
new ¯ x = old ¯ x × d
new sd x = old sd x × | d |
Applies exactly the same to y
new x i in s.u. = new x i - new ¯ x new sd x = old x i × d - old ¯ x × d old sd x × | d | = sign of d × old x i - old ¯ x old sd x = sign of d × old x i in s.u.
Values in s.u. don't change, correlation value doesn't change, but sign of correlation might
20 / 29
Affect of Outliers
Outliers ruin a linear pattern, making correlation drop
●● ● ● -2 -1 0 1 2 0 5 10 Correlation = 0.95 x y ●● ● ● -2 -1 0 1 2 -5 0 5 10 Correlation = 0.69 x y
21 / 29
Ecological Correlation
Calculating a correlation between group averages (instead of raw data) is called an ecological correlation
400 500 600 700 800 900 200 400 600 800 Correlation = 0.631 Verbal Score Math Score ● ●
