Data can be represented in two variables using a scatterplot.
A scatterplot is a data display consisting of the graph of a set of ordered pairs. A scatterplot shows data for two variables and helps to indicate whether there is a relationship between the variables. The data points of a scatterplot, unlike those of a line graph, are not connected by line segments.
A scatterplot can be used in the real world to analyze data to make informed decisions about a situation. For example, a forester might use a scatterplot to determine how fast maple trees in a reforested area are growing based on their age. Before plotting the data on a scatterplot, a table can be created to organize the data, which represent ordered pairs. Plotting the data on to a scatterplot can then reveal the relationship between the x- and y-values of the data set.
Maple Tree Age and Height
Age (years)
Height (meters)
2
1.5
2
1.6
3
1.75
3
2.1
4
1.55
4
1.75
4
1.95
5
2
5
2.15
5
2.4
5
2.7
6
2.25
6
3.1
7
2.6
8
2.5
8
2.75
8
3.05
9
3.2
Using a table can help organize data before plotting points on a scatterplot. For instance, a forester might use a table to record the age and height of maple trees in a reforested area.
Scatterplot of Maple Tree Age and Height
Scatterplots can help with visualizing patterns that indicate relationships between x- and y-values in a data set. For example, in a scatterplot showing the age and height of maple trees in a reforested area, clusters of points, their area of concentration, and their general direction reveal that the height of maple trees increases, represented by y-values, as their age increases, represented by x-values.
Trends in Data
A scatterplot can be used to visualize trends in data that indicate a positive, negative, or no linear relationship between two variables. The type of relationship between the variables is indicated by the slope of the line of best fit.
Scatterplots show linear patterns in data, even if there is no single line that goes through all the data points. To gauge the general pattern of data in a scatterplot, a line of best fit is calculated and drawn on a scatterplot to indicate the general direction of the points. The line of best fit, or regression line, minimizes the sum of the squared distances to all the points in a scatterplot. When the line of best fit for a data set is graphed on a scatterplot, about half the data points will be above the line and about half will be below the line. Any line that is far away from the data points and does not follow the general direction of the data points are not lines of best fit.
Scatterplot with a Positive Trend
The line of best fit is an approximation of the general trend of a data set. To identify the line of best fit in a scatterplot, the line will generally be in the middle of the data set. If the line of best fit has a positive slope and is close to the points in the scatterplot, then the scatterplot has a positive trend.
The slope, or the change in y-values over the change in x-values, of the line of best fit must match the general trend of the data on a scatterplot. Otherwise, the line is not considered a line of best fit. For a scatterplot with a positive trend, the line of best fit will not only be close to the data points on a scatterplot, but it will also have a positive slope. It indicates that the change in the y-values are increasing as the change in x-values are increasing.
Some scatterplots show a negative trend. The line of best fit for scatterplots with a negative trend will be close to the points in the scatterplot and will have a negative slope. So, the change in y-values are decreasing as the change in x-values are increasing. Note that the word negative refers only to the direction of the trend, not its strength.
Scatterplot with a Negative Trend
A scatterplot with a negative trend has a line of best fit with a negative slope and close to the data points on the scatterplot. For instance, the declining annual revenue of a business over a period of 10 years can be displayed in a scatterplot with a negative trend.
Not all scatterplots show a clear trend. If the distribution of the data points does not show an obvious pattern, it is more difficult to draw a representative line of best fit through the data points. For scatterplots without a clear trend, there may not be a relationship between the two variables displayed in the scatterplot.
Scatterplot Without a Linear Trend
Scatterplots without a clear linear trend indicate that there might not be a relationship between the values in the data set. A line of best fit cannot be drawn based on the general direction and position of the data points.
Other scatterplots may show a trend, but not one that is linear. If the data points of a scatterplot lie close to a curve or the pattern of the data points resemble a curve, the data set displays a nonlinear trend.