Box plots
The side by side histograms are hard to interpret. Box plots display the
summary information in a graphical way.
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1500
2000
2500
3000
3500
4000
4500
5000
Figure:
Box plot of birth weight.
301114 Nature of Data
27 / 48

301114 Nature of Data
28 / 48

Box plots
We can use box plots by groups.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
no
yes
1500
2000
2500
3000
3500
4000
4500
5000
Figure:
Box plot of birthweight given maternal smoking status.
301114 Nature of Data
29 / 48

Maternal Smoking
What can we see?
The “no” group seems to have a higher median birth weight
The spread of the two groups seems about the same
But the no group seems to have more points outside the 1.5 times the IQR
of the box
Does smoking effect maternal birth weight?
301114 Nature of Data
30 / 48

Maternal Smoking
What can we see?
The “no” group seems to have a higher median birth weight
The spread of the two groups seems about the same
But the no group seems to have more points outside the 1.5 times the IQR
of the box
Does smoking effect maternal birth weight?
To answer this, we need to ask a specific question about a measurement.
301114 Nature of Data
30 / 48

Outline
1
Maternal Smoking Data
2
Summarising data
3
Histograms
4
Box plots
5
Maternal Smoking - Simple comparison of two groups
6
Sales vs. Office Location - Another comparison of two groups
301114 Nature of Data
31 / 48

Does smoking affect the birth weight of
infants?
What is the question we are asking? We might mean “Do the infants of smoking
mothers have lower
average
birth weight?”
We must choose between the
mean
or the
median
.
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
no
1571
3229
3514
3516
3829
5029
yes
1657
2914
3286
3260
3600
4657
The median and mean both seem lower. But is this an artifact of the sample?
301114 Nature of Data
32 / 48

Does smoking affect the birth weight of
infants?

301114 Nature of Data
33 / 48

Does smoking affect the birth weight of
infants?
33 / 48