(i) After this step do we have a more symmetric

(i) After this step do we have a more symmetric distribution? What is a downside to subsetting the dataset in this way? Let’s use this as our final dataset and use technology to calculate some helpful numerical summaries of the distribution, namely describing the center and variability. Technology Detour Numerical Summaries In R > attach(births3) Or use births3\$birthweight > iscamsummary(birthweight, digits=4) Entering “digits = ” specifies the number of significant digits you want displayed. Default is 3. In Minitab x Choose Stat > Basic Statistics > Display Descriptive Statistics . x Specify the birthweight variable in the Variables box x Optional : Under Statistics you can choose which numbers are calculated. (j) Calculate and interpret the mean and the standard deviation of these data.

Chance/Rossman, 2015 ISCAM III Investigation 2.1 140 (k) How would the mean and standard deviation compare if we had not removed the 9999 values? Recall from Chapter 1 the normal probability distribution has some very nice properties and allows us to predict the behavior of our variable. For example, we might want to assume birth weights follow a normal distribution and estimate how often a baby will be of low birth weight (under 2500 grams according to the International Statistical Classification of Diseases, 10 th revision , World Health Organization, 2011) based on these data. (l) Do these birth weight data appear to behave like a normal distribution? How are you deciding? Assessing Model Fit Many software programs will allow you to overlay a theoretical normal distribution to see how well it matches your data, using the mean and standard deviation from the observed data. (m) Overlay a normal model on the distribution of birthweight data and comment on how well the model fits the data. x In R > iscamaddnorm(birthweight) Will take a few seconds to appear x In Minitab Right click on the graph and choose Add > Distribution Fit. Because the default distribution is Normal, press OK . Discuss any deviations from the pattern of a normal distribution. Another method for comparing your data to a theoretical normal distribution is to see whether certain properties of a normal distribution hold true. In Chapter 1, you learned that 95% of observations in a normal distribution should fall within two standard deviations of the mean. (n) Calculate the percentage of the birthweights that fall within 2 standard deviations of the mean by creating a Boolean (true/false) variable: x In R > within2sd = (birthweight > mean(birthweight) - 2*sd(birthweight)) & (birthweight < mean(birthweight) + 2*sd(birthweight)) > table(within2sd) x In Minitab Choose Calc > Calculator , store result in within2sd , expression : c2 > (mean(c2)-2*stdev(c2))& c2<(mean(c2)+2*std(c2)) Press OK . A “1” indicates the statement is true, a “0” indicates the statement is false.
