Unformatted text preview: Correlation & Regression
Correlation analysis determines:
if a relationship exists between 2 variables; assesses the strength of the relationship between the two variables. Correlation
Correlation may be:
positive, as one variable increases another also increases; Negative, as one variable increases another decreases. Correlation
Correlation is measured by the correlation coefficient r, (also called Pearson’s correlation coefficient).
The correlation coefficient r lies between 0 and ±1
The closer the value of r to ±1 the stronger the correlation between the two variables. Correlation
Initial analysis investigation should be a scatter diagram.
For Scientific analysis, a linear trend between two variables is usually desirable. A linear trend of values indicates some relationship between the two variables. Correlation
The closer the value of r to +1 or 1 the stronger the strength of the relationship between the variables; For example, r = ± 0.7 indicates correlation, but r = ± 0.9 indicates strong correlation, and r = ± 0.97 or higher indicates very strong correlation, r = ± 1 is perfect correlation. Correlation Ex 14.1.5 page 175: A linear relationship is indicated by the data, negative correlation between drug concentration and height is also indicated Correlation r = 0.777, thus indicating negative correlation between drug concentration and tree height. Drug concentration in the leaves does not increase with the height of the leaf on the tree, in fact the opposite happens – drug concentration is lower the higher the position of the leaf. Regression While correlation assesses if a relationship does exist between the two variables, regressions determines how they are related. Regression analysis produces an equation by which the value of the dependent variable can be predicted from the independent variable. Regression
A drug precursor molecule is extracted from a nut. The nuts are subject to contamination by a fungal toxin that is difficult to remove during the purification process. It is suspected that the amount of fungus and hence toxin depends on rainfall. Is it possible to predict toxin concentration from rainfall at the growing site? Regression Toxin levels are plotted against rainfall and checked for possible linear trend. Regression A liner trend is obvious so a trend line is fitted. Excel fits the ‘best line of fit’ such that the vertical distance (deviation) of each point from the line is at a minimum. Some points deviate above the line, some deviate below the line. The line gives the average trend of all the points plotted. Regression Regression All the deviations are taken, squared (otherwise positives and negatives would cancel out), and added up to give the sum of squares. The lower this value, the better the fit. Excel generates the line of best fit (the trendline) and states the equation of regression: y = a+bx. The intercept is a, and gradient/slope is b. Regression
The relationship between toxin
concentrations and rainfall is represented by the regression equation: Toxin concentration (g/100g) = 10.6 + 6.73 rainfall
The gradient is + 6.73, this means that an increase in rainfall of 1cm/week has an associated increase in toxin concentration of 6.73 g/100g. The intercept is 10.6 g/100g Regression Predictions using regression:
To investigate the toxin levels in two sites:
Site A has 2.05 cm/week of rainfall, while site B has 1.25cm/week. Site A: Toxin = 10.6+ 6.73(2.05) = 24.4 g/100g Site B: Toxin = 10.6+ 6.73(1.25) = 19.0 g/100g The lower rainfall of site B is predicted to yield a slightly better crop. Regression Predictions within the given range are reliable, but predictions beyond the given range must be considered with care as the linear trend may not continue in either direction. ...
View Full Document
- Spring '14
- Correlation, 1cm, 100g, 1.25cm, 2.05 cm