Histograms, Frequency Polygons, and Time Series Graphs

Learning Outcomes

  • Display data graphically and interpret graphs: stemplots, histograms, and box plots.
  • Recognize, describe, and calculate the measures of location of data: quartiles and percentiles.


For most of the work you do in this book, you will use a histogram to display the data. One advantage of a histogram is that it can readily display large data sets. A rule of thumb is to use a histogram when the data set consists of
100100
values or more.

histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents (for instance, distance from your home to school). The vertical axis is labeled either frequency or relative frequency (or percent frequency or probability). The graph will have the same shape with either label. The histogram (like the stemplot) can give you the shape of the data, the center, and the spread of the data.

The relative frequency is equal to the frequency for an observed value of the data divided by the total number of data values in the sample. (Remember, frequency is defined as the number of times an answer occurs.) If:

  • ff
    = frequency
  • nn
    = total number of data values (or the sum of the individual frequencies), and
  • RFRF
    = relative frequency,


then
RF=fn\displaystyle{R}{F}=\frac{{f}}{{n}}


For example, if three students in Mr. Ahab's English class of
4040
students received from
9090
% to
100100
%, then,
f=3,n=40\displaystyle{f}={3},{n}={40}
, and
RF=fn=340=0.075{R}{F}=\frac{{f}}{{n}}=\frac{{3}}{{40}}={0.075}
.
7.57.5
% of the students received
9010090–100
%.
9010090–100
% are quantitative measures.

To construct a histogram, first decide how many bars or intervals, also called classes, represent the data. Many histograms consist of five to
1515
bars or classes for clarity. The number of bars needs to be chosen. Choose a starting point for the first interval to be less than the smallest data value. A convenient starting point is a lower value carried out to one more decimal place than the value with the most decimal places. For example, if the value with the most decimal places is
6.16.1
and this is the smallest value, a convenient starting point is
6.056.05
(
6.10.05=6.056.1 – 0.05 = 6.05
). We say that
6.056.05
has more precision. If the value with the most decimal places is
2.232.23
and the lowest value is
1.51.5
, a convenient starting point is
1.4951.495
(
1.50.005=1.4951.5 – 0.005 = 1.495
). If the value with the most decimal places is
3.2343.234
and the lowest value is
1.01.0
, a convenient starting point is
0.99950.9995
(
1.00.0005=0.99951.0 – 0.0005 = 0.9995
). If all the data happen to be integers and the smallest value is two, then a convenient starting point is
1.51.5
(
20.5=1.52 – 0.5 = 1.5
). Also, when the starting point and other boundaries are carried to one additional decimal place, no data value will fall on a boundary. The next two examples go into detail about how to construct a histogram using continuous data and how to create a histogram using discrete data.

Watch the following video for an example of how to draw a histogram.



Example

The following data are the heights (in inches to the nearest half inch) of
100100
male semiprofessional soccer players. The heights are continuous data, since height is measured.

6060
;
60.560.5
;
6161
;
6161
;
61.561.5


63.563.5
;
63.563.5
;
63.563.5


6464
;
6464
;
6464
;
6464
;
6464
;
6464
;
6464
;
64.564.5
;
64.564.5
;
64.564.5
;
64.564.5
;
64.564.5
;
64.564.5
;
64.564.5
;
64.56664.566
;
6666
;
6666
;
6666
;
6666
;
6666
;
6666
;
6666
;
6666
;
6666
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
66.566.5
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
6767
;
67.567.5
;
67.567.5
;
67.567.5
;
67.567.5
;
67.567.5
;
67.567.5
;
67.567.5


6868
;
6868
;
6969
;
6969
;
6969
;
6969
;
6969
;
6969
;
6969
;
6969
;
6969
;
6969
;
69.569.5
;
69.569.5
;
69.569.5
;
69.569.5
;
69.569.5


7070
;
7070
;
7070
;
7070
;
7070
;
7070
;
70.570.5
;
70.570.5
;
70.570.5
;
7171
;
7171
;
7171


7272
;
7272
;
7272
;
72.572.5
;
72.572.5
;
7373
;
73.573.5
;
7474


The smallest data value is
6060
. Since the data with the most decimal places has one decimal (for instance,
61.561.5
), we want our starting point to have two decimal places. Since the numbers
0.50.5
,
0.050.05
,
0.0050.005
, etc. are convenient numbers, use
0.050.05
and subtract it from
6060
, the smallest value, for the convenient starting point.

600.05=59.9560 – 0.05 = 59.95
which is more precise than, say,
61.561.5
by one decimal place. The starting point is, then,
59.9559.95
.

The largest value is
7474
, so
74+0.05=74.0574 + 0.05 = 74.05
is the ending value.

Next, calculate the width of each bar or class interval. To calculate this width, subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire). Suppose you choose eight bars.

74.0559.958=1.76\displaystyle\frac{{{74.05}-{59.95}}}{{8}}={1.76}

Note

We will round up to two and make each bar or class interval two units wide. Rounding up to two is one way to prevent a value from falling on a boundary. Rounding to the next number is often necessary even if it goes against the standard rules of rounding. For this example, using
1.761.76
as the width would also work. A guideline that is followed by some for the width of a bar or class interval is to take the square root of the number of data values and then round to the nearest whole number, if necessary. For example, if there are
150150
values of data, take the square root of
150150
and round to
1212
bars or intervals.

The boundaries are:

  • 59.9559.95
  • 59.95+2=61.9559.95 + 2 = 61.95
  • 61.95+2=63.9561.95 + 2 = 63.95
  • 63.95+2=65.9563.95 + 2 = 65.95
  • 65.95+2=67.9565.95 + 2 = 67.95
  • 67.95+2=69.9567.95 + 2 = 69.95
  • 69.95+2=71.9569.95 + 2 = 71.95
  • 71.95+2=73.9571.95 + 2 = 73.95
  • 73.95+2=75.9573.95 + 2 = 75.95


The heights
6060
through
61.561.5
inches are in the interval
59.9561.9559.95–61.95
. The heights that are
63.563.5
are in the interval
61.9563.9561.95–63.95
. The heights that are
6464
through
64.564.5
are in the interval
63.9565.9563.95–65.95
. The heights
6666
through
67.567.5
are in the interval
65.9567.9565.95–67.95
. The heights
6868
through
69.569.5
are in the interval
67.9569.9567.95–69.95
. The heights
7070
through
7171
are in the interval
69.9571.9569.95–71.95
. The heights
7272
through
73.573.5
are in the interval
71.9573.9571.95–73.95
. The height
7474
is in the interval
73.9575.9573.95–75.95
.

The following histogram displays the heights on the
xx
-axis and relative frequency on the
yy
-axis.

Histogram consists of 8 bars with the y-axis in increments of 0.05 from 0-0.4 and the x-axis in intervals of 2 from 59.95-75.95.

Try It

The following data are the shoe sizes of
5050
male students. The sizes are continuous data since shoe size is measured. Construct a histogram and calculate the width of each bar or class interval. Suppose you choose six bars.

99
;
99
;
9.59.5
;
9.59.5
;
1010
;
1010
;
1010
;
1010
;
1010
;
1010
;
10.510.5
;
10.510.5
;
10.510.5
;
10.510.5
;
10.510.5
;
10.510.5
;
10.510.5
;
10.510.5


1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
1111
;
11.511.5
;
11.511.5
;
11.511.5
;
11.511.5
;
11.511.5
;
11.511.5
;
11.511.5


1212
;
1212
;
1212
;
1212
;
1212
;
1212
;
1212
;
12.512.5
;
12.512.5
;
12.512.5
;
12.512.5
;
1414






Example

The following data are the number of books bought by 50 part-time college students at ABC College. The number of books is discrete data, since books are counted.

11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11


22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22


33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33
;
33


44
;
44
;
44
;
44
;
44
;
44


55
;
55
;
55
;
55
;
55


66
;
66


Eleven students buy one book. Ten students buy two books. Sixteen students buy three books. Six students buy four books. Five students buy five books. Two students buy six books.

Because the data are integers, subtract
0.50.5
from
11
, the smallest data value and add
0.50.5
to
66
, the largest data value. Then the starting point is
0.50.5
and the ending value is
6.56.5
.

Next, calculate the width of each bar or class interval. If the data are discrete and there are not too many different values, a width that places the data values in the middle of the bar or class interval is the most convenient. Since the data consist of the numbers
11
,
22
,
33
,
44
,
55
,
66
, and the starting point is
0.50.5
, a width of one places the
11
in the middle of the interval from
0.50.5
to
1.51.5
, the
22
in the middle of the interval from
1.51.5
to
2.52.5
, the
33
in the middle of the interval from
2.52.5
to
3.53.5
, the
44
in the middle of the interval from _______ to _______, the
55
in the middle of the interval from _______ to _______, and the _______ in the middle of the interval from _______ to _______ .





Calculate the number of bars as follows:

6.50.5number of bars=1\displaystyle\frac{{{6.5}-{0.5}}}{{\text{number of bars}}}={1}
where
11
is the width of a bar. Therefore, bars =
66
.

The following histogram displays the number of books on the
xx
-axis and the frequency on the
yy
-axis.

Histogram consists of 6 bars with the y-axis in increments of 2 from 0-16 and the x-axis in intervals of 1 from 0.5-6.5.

USING THE TI-83, 83+, 84, 84+ CALCULATOR

Create the histogram for Example 2.

  • Press Y=. Press CLEAR to delete any equations.
  • Press STAT 1:EDIT. If L1 has data in it, arrow up into the name L1, press CLEAR and then arrow down. If necessary, do the same for L2.
  • Into L1, enter
    11
    ,
    22
    ,
    33
    ,
    44
    ,
    55
    ,
    66
    .
  • Into L2, enter
    1111
    ,
    1010
    ,
    1616
    ,
    66
    ,
    55
    ,
    22
    .
  • Press WINDOW. Set Xmin =
    .5.5
    , Xscl =
    (6.5.5)/6(6.5 – .5)/6
    , Ymin =
    1–1
    , Ymax =
    2020
    , Yscl =
    11
    , Xres =
    11
    .
  • Press 2nd Y=. Start by pressing 4:Plotsoff ENTER.
  • Press 2nd Y=. Press 1:Plot1. Press ENTER. Arrow down to TYPE. Arrow to the 3rd picture (histogram). Press ENTER.
  • Arrow down to Xlist: Enter L1 (2nd 1). Arrow down to Freq. Enter L2 (2nd 2).
  • Press GRAPH.
  • Use the TRACE key and the arrow keys to examine the histogram.


Try It

The following data are the number of sports played by 50 student athletes. The number of sports is discrete data since sports are counted.

11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11
;
11


22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22
;
22


33
;
33
;
33
;
33
;
33
;
33
;
33
;
33


2020
student athletes play one sport.
2222
student athletes play two sports. Eight student athletes play three sports.

Fill in the blanks for the following sentence. Since the data consist of the numbers
11
,
22
,
33
, and the starting point is
0.50.5
, a width of one places the
11
in the middle of the interval
0.50.5
to _____, the
22
in the middle of the interval from _____ to _____, and the
33
in the middle of the interval from _____ to _____.





Example

Using this data set, construct a histogram.

Number of Hours My Classmates Spent Playing Video Games on Weekends
9.959.95
1010
2.252.25
16.7516.75
00
19.519.5
22.522.5
7.57.5
1515
12.7512.75
5.55.5
1111
1010
20.7520.75
17.517.5
2323
21.921.9
2424
23.7523.75
1818
2020
1515
22.922.9
18.818.8
20.520.5




Try It

The following data represent the number of employees at various restaurants in New York City. Using this data, create a histogram.

2222
;
3535
;
1515
;
2626
;
4040
;
2828
;
1818
;
2020
;
2525
;
3434
;
3939
;
4242
;
2424
;
2222
;
1919
;
2727
;
2222
;
3434
;
4040
;
2020
;
3838
; and
2828


Use
101910–19
as the first interval.

COLLABORATIVE EXERCISE

Count the money (bills and change) in your pocket or purse. Your instructor will record the amounts. As a class, construct a histogram displaying the data. Discuss how many intervals you think is appropriate. You may want to experiment with the number of intervals.

Frequency Polygons

Frequency polygons are analogous to line graphs, and just as line graphs make continuous data visually easy to interpret, so too do frequency polygons.

To construct a frequency polygon, first examine the data and decide on the number of intervals, or class intervals, to use on the
xx
-axis and
yy
-axis. After choosing the appropriate ranges, begin plotting the data points. After all the points are plotted, draw line segments to connect them.

example

A frequency polygon was constructed from the frequency table below.

Frequency Distribution for Calculus Final Test Scores
Lower Bound Upper Bound Frequency Cumulative Frequency
49.549.5
59.559.5
55
55
59.559.5
69.569.5
1010
1515
69.569.5
79.579.5
3030
4545
79.579.5
89.589.5
4040
8585
89.589.5
99.599.5
1515
100100
A frequency polygon was constructed from the frequency table below.
The first label on the
xx
-axis is
44.544.5
. This represents an interval extending from
39.539.5
to
49.549.5
. Since the lowest test score is
54.554.5
, this interval is used only to allow the graph to touch the
xx
-axis. The point labeled
54.554.5
represents the next interval, or the first “real” interval from the table, and contains five scores. This reasoning is followed for each of the remaining intervals with the point
104.5104.5
representing the interval from
99.599.5
to
109.5109.5
. Again, this interval contains no data and is only used so that the graph will touch the
xx
-axis. Looking at the graph, we say that this distribution is skewed because one side of the graph does not mirror the other side.

Try It

Construct a frequency polygon of U.S. Presidents’ ages at inauguration shown in the table.

Age at Inauguration Frequency
41.546.541.5–46.5
44
46.551.546.5–51.5
1111
51.556.551.5–56.5
1414
56.561.556.5–61.5
99
61.566.561.5–66.5
44
66.571.566.5–71.5
22
Frequency polygons are useful for comparing distributions. This is achieved by overlaying the frequency polygons drawn for different data sets.

example

We will construct an overlay frequency polygon comparing the scores with the students’ final numeric grade.

Frequency Distribution for Calculus Final Test Scores
Lower Bound Upper Bound Frequency Cumulative Frequency
49.549.5
59.559.5
55
55
59.559.5
69.569.5
1010
1515
69.569.5
79.579.5
3030
4545
79.579.5
89.589.5
4040
8585
89.589.5
99.599.5
1515
100100
Frequency Distribution for Calculus Final Grades
Lower Bound Upper Bound Frequency Cumulative Frequency
49.549.5
59.559.5
1010
1010
59.559.5
69.569.5
1010
2020
69.569.5
79.579.5
3030
5050
79.579.5
89.589.5
4545
9595
89.589.5
99.599.5
55
100100
This is an overlay frequency polygon that matches the supplied data. The x-axis shows the grades, and the y-axis shows the frequency.
Suppose that we want to study the temperature range of a region for an entire month. Every day at noon we note the temperature and write this down in a log. A variety of statistical studies could be done with this data. We could find the mean or the median temperature for the month. We could construct a histogram displaying the number of days that temperatures reach a certain range of values. However, all of these methods ignore a portion of the data that we have collected.

One feature of the data that we may want to consider is that of time. Since each date is paired with the temperature reading for the day, we don‘t have to think of the data as being random. We can instead use the times given to impose a chronological order on the data. A graph that recognizes this ordering and displays the changing temperature as the month progresses is called a time series graph.

Constructing a Time Series Graph

To construct a time series graph, we must look at both pieces of our paired data set. We start with a standard Cartesian coordinate system. The horizontal axis is used to plot the date or time increments, and the vertical axis is used to plot the values of the variable that we are measuring. By doing this, we make each point on the graph correspond to a date and a measured quantity. The points on the graph are typically connected by straight lines in the order in which they occur.

Example

The following data shows the Annual Consumer Price Index, each month, for ten years. Construct a time series graph for the Annual Consumer Price Index data only.

Year Jan Feb Mar Apr May Jun Jul
2003
181.7181.7
183.1183.1
184.2184.2
183.8183.8
183.5183.5
183.7183.7
183.9183.9
2004
185.2185.2
186.2186.2
187.4187.4
188.0188.0
189.1189.1
189.7189.7
189.4189.4
2005
190.7190.7
191.8191.8
193.3193.3
194.6194.6
194.4194.4
194.5194.5
195.4195.4
2006
198.3198.3
198.7198.7
199.8199.8
201.5201.5
202.5202.5
202.9202.9
203.5203.5
2007
202.416202.416
203.499203.499
205.352205.352
206.686206.686
207.949207.949
208.352208.352
208.299208.299
2008
211.080211.080
211.693211.693
213.528213.528
214.823214.823
216.632216.632
218.815218.815
219.964219.964
2009
211.143211.143
212.193212.193
212.709212.709
213.240213.240
213.856213.856
215.693215.693
215.351215.351
2010
216.687216.687
216.741216.741
217.631217.631
218.009218.009
218.178218.178
217.965217.965
218.011218.011
2011
220.223220.223
221.309221.309
223.467223.467
224.906224.906
225.964225.964
225.722225.722
225.922225.922
2012
226.665226.665
227.663227.663
229.392229.392
230.085230.085
229.815229.815
229.478229.478
229.104229.104
Year Aug Sep Oct Nov Dec Annual
2003
184.6184.6
185.2185.2
185.0185.0
184.5184.5
184.3184.3
184.0184.0
2004
189.5189.5
189.9189.9
190.9190.9
191.0191.0
190.3190.3
188.9188.9
2005
196.4196.4
198.8198.8
199.2199.2
197.6197.6
196.8196.8
195.3195.3
2006
203.9203.9
202.9202.9
201.8201.8
201.5201.5
201.8201.8
201.6201.6
2007
207.917207.917
208.490208.490
208.936208.936
210.177210.177
210.036210.036
207.342207.342
2008
219.086219.086
218.783218.783
216.573216.573
212.425212.425
210.228210.228
215.303215.303
2009
215.834215.834
215.969215.969
216.177216.177
216.330216.330
215.949215.949
214.537214.537
2010
218.312218.312
218.439218.439
218.711218.711
218.803218.803
219.179219.179
218.056218.056
2011
226.545226.545
226.889226.889
226.421226.421
226.230226.230
225.672225.672
224.939224.939
2012
230.379230.379
231.407231.407
231.317231.317
230.221230.221
229.601229.601
229.594229.594




Try It

The following table is a portion of a data set from www.worldbank.org. Use the table to construct a time series graph for CO2 emissions for the United States.

CO2 Emissions
Ukraine United Kingdom United States
2003
352,259352,259
540,640540,640
5,681,6645,681,664
2004
343,121343,121
540,409540,409
5,790,7615,790,761
2005
339,029339,029
541,990541,990
5,826,3945,826,394
2006
327,797327,797
542,045542,045
5,737,6155,737,615
2007
328,357328,357
528,631528,631
5,828,6975,828,697
2008
323,657323,657
522,247522,247
5,656,8395,656,839
2009
272,176272,176
474,579474,579
5,299,5635,299,563

Uses of a Time Series Graph

Time series graphs are important tools in various applications of statistics. When recording values of the same variable over an extended period of time, sometimes it is difficult to discern any trend or pattern. However, once the same data points are displayed graphically, some features jump out. Time series graphs make trends easy to spot.

Concept Review

A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. Histograms are typically used for large, continuous, quantitative data sets. A frequency polygon can also be used when graphing large data sets with data points that repeat. The data usually goes on
yy
-axis with the frequency being graphed on the
xx
-axis. Time series graphs can be helpful when looking at large amounts of data for one variable over a period of time.

References

Data on annual homicides in Detroit, 1961–73, from Gunst & Mason’s book ‘Regression Analysis and its Application’, Marcel Dekker

“Timeline: Guide to the U.S. Presidents: Information on every president’s birthplace, political party, term of office, and more.” Scholastic, 2013. Available online at http://www.scholastic.com/teachers/article/timeline-guide-us-presidents (accessed April 3, 2013).

“Presidents.” Fact Monster. Pearson Education, 2007. Available online at http://www.factmonster.com/ipka/A0194030.html (accessed April 3, 2013).

“Food Security Statistics.” Food and Agriculture Organization of the United Nations. Available online at http://www.fao.org/economic/ess/ess-fs/en/ (accessed April 3, 2013).

“Consumer Price Index.” United States Department of Labor: Bureau of Labor Statistics. Available online at http://data.bls.gov/pdq/SurveyOutputServlet (accessed April 3, 2013).

“CO2 emissions (kt).” The World Bank, 2013. Available online at http://databank.worldbank.org/data/home.aspx (accessed April 3, 2013).

“Births Time Series Data.” General Register Office For Scotland, 2013. Available online at http://www.gro-scotland.gov.uk/statistics/theme/vital-events/births/time-series.html (accessed April 3, 2013).

“Demographics: Children under the age of 5 years underweight.” Indexmundi. Available online at http://www.indexmundi.com/g/r.aspx?t=50&v=2224&aml=en (accessed April 3, 2013).

Gunst, Richard, Robert Mason. Regression Analysis and Its Application: A Data-Oriented Approach. CRC Press: 1980.

“Overweight and Obesity: Adult Obesity Facts.” Centers for Disease Control and Prevention. Available online at http://www.cdc.gov/obesity/data/adult.html (accessed September 13, 2013).

Licenses and Attributions

More Study Resources for You

Show More