1
©2009 by L. Lagerstrom
Statistics and Histograms
•
Frequency distributions
•
Absolute vs. relative frequencies
•
Insights into data
•
The hist function
•
Creating relative frequency distributions
•
More options: the bar function, bin edges
•
Mean, median, standard deviation, and variance
©2009 by L. Lagerstrom
Frequency Distributions
Often in science and engineering we have sets of data from
experiments or observations and need to figure out the key
characteristics of the data. To do so, we of course calculate
quantities such as the mean, the median, and the standard
deviation.
More generally, we construct a "frequency distribution." A frequency
distribution divides the range of the data into intervals (or "bins") of a
certain size, and then counts how many data points are in each
interval. A classic example is a set of exam scores. To get an idea of
the exam results, we might count how many scores there were in the
50s, the 60s, the 70s, the 80s, and the 90s (assuming the lowest
score was in the 50s and the highest in the 90s). In other words, we
are counting the
frequency
of a result in the 50s, 60s, 70s, etc.
We often take the counts and plot them in a bar plot, giving a plot of
the
distribution
of
frequencies
. This type of frequency distribution plot
is called a histogram.
©2009 by L. Lagerstrom
A Temperature Example
Imagine that we have collected data on noon-time temperatures over
a 10-day period for a certain city. The results are shown below :
T = 74, 78, 83, 79, 72, 67, 69, 85, 91, 86
We want to plot a frequency
distribution, so we decide to count
how many temperatures were in the
60s, the 70s, the 80s, and the 90s.
In other words, we have an interval
or bin size of 10. (We could choose
something else; for example, we
might divide the range into intervals
of 70-74, 75-79, 80-84, etc.) The
histogram then looks as shown on
the right, with four bins.
©2009 by L. Lagerstrom
Absolute vs. Relative Frequencies
The frequency distribution of temperature data on the previous slide
is known as an "absolute frequency distribution," because we are
counting the absolute number of temperatures that fall within each
interval.
We can also create a "relative frequency distribution." In this case,
we calculate the
fraction
of temperatures that fall within each
interval. To do so, we count the absolute number in each interval
and then simply take the results and divide each interval's number
by the total number of data points.
So, for example, on the previous slide there are three temperatures
that fall within the 80s interval in the absolute frequency distribution
of the temperatures. Since there are 10 temperature data points
total, the relative frequency for the 80s interval is its absolute
frequency divided by 10, i.e., 3/10 = 0.3. This tells us that 30% of the
measured temperatures fall within the 80s interval (or bin).