# Box plots the side by side histograms are hard to

• 64

This preview shows page 37 - 46 out of 64 pages.

Box plots The side by side histograms are hard to interpret. Box plots display the summary information in a graphical way. ● ● ● ● 1500 2000 2500 3000 3500 4000 4500 5000 Figure: Box plot of birth weight. 301114 Nature of Data 27 / 48
301114 Nature of Data 28 / 48
Box plots We can use box plots by groups. no yes 1500 2000 2500 3000 3500 4000 4500 5000 Figure: Box plot of birthweight given maternal smoking status. 301114 Nature of Data 29 / 48
Maternal Smoking What can we see? The “no” group seems to have a higher median birth weight The spread of the two groups seems about the same But the no group seems to have more points outside the 1.5 times the IQR of the box Does smoking effect maternal birth weight? 301114 Nature of Data 30 / 48
Maternal Smoking What can we see? The “no” group seems to have a higher median birth weight The spread of the two groups seems about the same But the no group seems to have more points outside the 1.5 times the IQR of the box Does smoking effect maternal birth weight? To answer this, we need to ask a specific question about a measurement. 301114 Nature of Data 30 / 48
Outline 1 Maternal Smoking Data 2 Summarising data 3 Histograms 4 Box plots 5 Maternal Smoking - Simple comparison of two groups 6 Sales vs. Office Location - Another comparison of two groups 301114 Nature of Data 31 / 48
Does smoking affect the birth weight of infants? What is the question we are asking? We might mean “Do the infants of smoking mothers have lower average birth weight?” We must choose between the mean or the median . Min. 1st Qu. Median Mean 3rd Qu. Max. no 1571 3229 3514 3516 3829 5029 yes 1657 2914 3286 3260 3600 4657 The median and mean both seem lower. But is this an artifact of the sample? 301114 Nature of Data 32 / 48
Does smoking affect the birth weight of infants?
301114 Nature of Data 33 / 48
Does smoking affect the birth weight of infants? 33 / 48