Linear Regression
The correlation coefficient, $r$, of a line of best fit is a value between –1 and 1, inclusive, that indicates the strength and direction of the correlation of the line.
- An $r$-value of 1 indicates that the line has a positive slope and all the points lie on the line.
- A value of $r$ close to 1 indicates a strong positive correlation.
- A positive $r$ -value closer to zero than to 1 indicates a weak positive correlation.
- An $r$ -value of zero indicates no correlation.
- A negative $r$ -value closer to zero than to –1 indicates a weak negative correlation.
- A value of $r$ close to –1 indicates a strong negative correlation.
- An $r$ -value of –1 indicates that the line has a negative slope and all the points lie on the line.
Interpreting Lines of Best Fit
Employees at a company start with an average salary of $40,500 at year zero. The table shows the average salaries of the company's employees for selected years of service. Graph and interpret the line of best fit.
Year | Average Salary |
---|---|
0 | $40,500 |
1 | $42,000 |
2 | $43,500 |
3 | $45,000 |
4 | $45,500 |
5 | $47,000 |
9 | $52,000 |
10 | $54,000 |
11 | $55,500 |
12 | $56,000 |
13 | $56,500 |
14 | $56,000 |
15 | $56,500 |
16 | $57,000 |
17 | $58,000 |
- The correlation coefficient of $r\approx 0.98$ means that there is a very strong positive correlation between the years and salaries. When employees work at the company for a number of years, their average salaries have increased.
- The slope of the line is about 1,061, which means that salaries have increased by about $1,061 per year.
Making Predictions Using Lines of Best Fit
The line of best fit can be used to predict values that are not in the data set.
- Interpolation is predicting a data value between given data points.
- Extrapolation is predicting a data value outside the set of given data points.
Predictions using the line of best fit may not always be accurate because the trend may not continue into the future. Thus, extrapolation from the line of best fit is associated with a greater degree of uncertainty than interpolation and is more likely to produce inaccurate results. By contrast, interpolation is quite useful for making accurate predictions between measured values.
The line of best fit for the average salaries at a company, where $y$ is the average salary in dollars and $x$ is the number of years the company has been in operation, is: