This preview shows pages 1–2. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: MN1025 – Business Statistics 29 Lecture 7—Friday 22/2/2008 CORRELATION Reference: Lind et al. , Chapter 13. 7.1 Correlated data: examples We suppose we have a sample of people (or things) with two “measurements” for each. For example: i) A sample of people: their weights and heights; ii) Employed persons: their salaries and their edu- cational qualifications (measured on some scale); iii) Factories: rent per square foot and distance to the nearest motorway junction; iv) People: amount of hair and intelligence. For each of these cases, we might expect: i) A positive correlation: height increases, weight tends to increase. ii) Again a positive correlation, but perhaps not so strong; but we would expect, on average, those with higher qualifications to earn more. iii) Strong negative correlation: we expect rents to decrease as the distance to the nearest motorway increases. iv) No correlation (at least according to the lec- turer’s prejudice. . . ). This lecture is organised as follows. After a brief discussion of causation, we look at ex- amples like those above graphically. Then we show how a numerical estimate, the sample correlation co- efficient , can be obtained. Then we show how one can test statistically if the correlation for the un- derlying population is non-zero. Finally we discuss best-fit lines for correlated data. 7.2 Correlation versus Causation The correlation coefficient is a number that lies be- tween − 1 (perfect negative correlation) and +1 (per- fect positive correlation). This is a statistical quan- tity: a correlation coefficient different from zero may suggest a causal relationship, but it does not prove one. Here is a drastic example taken from Lind et al : In the last 100 years, as the population of donkeys has decreased, there has been an increase in the number of PhDs. Despite the strong negative correlation be- tween the quantities, the increase in PhD numbers is not caused by the fall in donkey numbers. Another example: we may find a correlation between movements of share price indexes in London and New York. Does this mean that one index move- ment causes the other one? Not necessarily: it may be that the same underlying causes are driving both sets of prices. On the other hand, if there is a plausible connec- tion between the data, correlation may be taken as evidence for a causal influence. For example: it is plausible that smoking leads to lung cancer, so if the data show a correlation, this must be taken se- riously. 7.3 Example: weight vs. height Ten female employees are chosen at random and their heights and weights are listed. Data Display employee height weight 1 68 119 2 67 118 3 65 129 4 68 135 5 64 123 6 67 140 7 66 125 8 65 132 9 64 118 10 66 130 We then do a scatter plot (GRAPH → SCATTER PLOT): here, for example, height 68 inches corre- sponds to weight 119 as well as 135 pounds. We label the plot as shown....
View Full Document