Frequency and Relative Frequency Tables
Bar Charts and Pie Charts
Mode and Median
Xinghua Zheng
Categorical Data Description
Example 1. Amazon Hosts
±
Every day, more than 100 million shoppers visit Amazon.com,
either by typing the address into their browser or by clicking
through from another web site (host).
±
Motivating Question:
Which hosts send the most visitors to Amazon’s website?
(So that Amazon can further decide whether it’s worthwhile to
pay more to keep links on certain busy sites.)
Looking At Data
Date
Purchase
Amount ($)
Host
Region
21Sep2002
No
0
yahoo.com
Northeast
21Sep2002
Yes
65.97
msn.com
West
21Sep2002
10.00
google.com
South
22Sep2002
No
0
Midwest
22Sep2002
25.10
yahoo.com
Northeast
22Sep2002
125.08
Midwest
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
I
Data set consists of 188,996 visits
I
To answer the motivating question we must describe the
distribution of
Hosts
±
Host
is a categorical
variable
1
The
Host
ﬁeld is left blank when the customer typed amazon.com into a
browser.
Frequency and Relative Frequency Tables
±
For categorical data, names identify different categories
I
This data can be summarized using a
frequency table
±
A table that summarizes the number of items (i.e., frequency)
in each of several nonoverlapping classes
or a
relative frequency table
±
A table that summarizes the proportion of items in each class
±
For each class, proportion =relative freqency= the frequency of
the class divided by the total number of observations
Frequency and Relative Frequency Tables
