# Lines of Best Fit

### Linear Regression Linear regression is the process used to find the equation of a line of best fit that approximates the closest linear relationship between two variables. The correlation coefficient indicates the strength of the linear fit to the data.
Linear regression is a statistical method that calculates a line of best fit for a given set of data points. The line of best fit has the minimum value for the sum of the squares of the distances from the data points to the line. The line of best fit is also called a regression line for the data in a scatterplot. The distance from each point to the line is calculated as the sum of the squares of each point's distance, which is minimized by the process of linear regression.
A correlation is a relationship between two variables. When the points in a scatterplot are very close to a line of best fit, there is a strong correlation. When they show a general linear pattern, but are not close to the line, there is a weak correlation. Both positive and negative trends may exhibit strong or weak correlations. When the points show no linear pattern at all, there is no linear correlation.

The correlation coefficient, $r$, of a line of best fit is a value between –1 and 1, inclusive, that indicates the strength and direction of the correlation of the line.

• An $r$-value of 1 indicates that the line has a positive slope and all the points lie on the line.
• A value of $r$ close to 1 indicates a strong positive correlation.
• A positive $r$ -value closer to zero than to 1 indicates a weak positive correlation.
• An $r$ -value of zero indicates no correlation.
• A negative $r$ -value closer to zero than to –1 indicates a weak negative correlation.
• A value of $r$ close to –1 indicates a strong negative correlation.
• An $r$ -value of –1 indicates that the line has a negative slope and all the points lie on the line. Scatterplots show how the correlation coefficient indicates the strength and direction of a linear correlation. The sign of rrr indicates whether the correlation is positive or negative, and the absolute value of rrr indicates the strength of the correlation. The greater the absolute value, the stronger the correlation.

### Interpreting Lines of Best Fit Technology can be used to generate a line of best fit.
For very small data sets, a line of best fit can be calculated by hand. Most often, technology such as graphing calculators, spreadsheets, or online tools is used to determine the equation of the line.
Step-By-Step Example
Determining the Equation of a Line of Best Fit

Employees at a company start with an average salary of $40,500 at year zero. The table shows the average salaries of the company's employees for selected years of service. Graph and interpret the line of best fit. Year Average Salary 0$40,500
1 $42,000 2$43,500
3 $45,000 4$45,500
5 $47,000 9$52,000
10 $54,000 11$55,500
12 $56,000 13$56,500
14 $56,000 15$56,500
16 $57,000 17$58,000

Step 1
Create a scatterplot of the data.
Step 2
Calculate the line of best fit by using the linear regression function of a graphing calculator.
The line of best fit is:
$y \approx 1\rm{,}061x + 41\rm{,}660$
Solution
Graph the line of best fit on the scatterplot.
Next, interpret the line of best fit.
• The correlation coefficient of $r\approx 0.98$ means that there is a very strong positive correlation between the years and salaries. When employees work at the company for a number of years, their average salaries have increased.