This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: MN1025 – Business Statistics 29 Lecture 7—Friday 22/2/2008 CORRELATION Reference: Lind et al. , Chapter 13. 7.1 Correlated data: examples We suppose we have a sample of people (or things) with two “measurements” for each. For example: i) A sample of people: their weights and heights; ii) Employed persons: their salaries and their edu- cational qualifications (measured on some scale); iii) Factories: rent per square foot and distance to the nearest motorway junction; iv) People: amount of hair and intelligence. For each of these cases, we might expect: i) A positive correlation: height increases, weight tends to increase. ii) Again a positive correlation, but perhaps not so strong; but we would expect, on average, those with higher qualifications to earn more. iii) Strong negative correlation: we expect rents to decrease as the distance to the nearest motorway increases. iv) No correlation (at least according to the lec- turer’s prejudice. . . ). This lecture is organised as follows. After a brief discussion of causation, we look at ex- amples like those above graphically. Then we show how a numerical estimate, the sample correlation co- efficient , can be obtained. Then we show how one can test statistically if the correlation for the un- derlying population is non-zero. Finally we discuss best-fit lines for correlated data. 7.2 Correlation versus Causation The correlation coefficient is a number that lies be- tween − 1 (perfect negative correlation) and +1 (per- fect positive correlation). This is a statistical quan- tity: a correlation coefficient different from zero may suggest a causal relationship, but it does not prove one. Here is a drastic example taken from Lind et al : In the last 100 years, as the population of donkeys has decreased, there has been an increase in the number of PhDs. Despite the strong negative correlation be- tween the quantities, the increase in PhD numbers is not caused by the fall in donkey numbers. Another example: we may find a correlation between movements of share price indexes in London and New York. Does this mean that one index move- ment causes the other one? Not necessarily: it may be that the same underlying causes are driving both sets of prices. On the other hand, if there is a plausible connec- tion between the data, correlation may be taken as evidence for a causal influence. For example: it is plausible that smoking leads to lung cancer, so if the data show a correlation, this must be taken se- riously. 7.3 Example: weight vs. height Ten female employees are chosen at random and their heights and weights are listed. Data Display employee height weight 1 68 119 2 67 118 3 65 129 4 68 135 5 64 123 6 67 140 7 66 125 8 65 132 9 64 118 10 66 130 We then do a scatter plot (GRAPH → SCATTER PLOT): here, for example, height 68 inches corre- sponds to weight 119 as well as 135 pounds. We label the plot as shown....
View Full Document
This note was uploaded on 04/17/2008 for the course MN 1025 taught by Professor Schack during the Spring '08 term at Royal Holloway.
- Spring '08