M1 solns and REMARKS

# 14 less than half of the employees took 11 or more

14. Less than half of the employees took 11 or more sick days. a. TRUE b . FALSE (11 is the median) 15. The label on the vertical axis of the above boxplot should be: a. Number of sick days b. Number of employees (Incorrect. We have no idea how many employees are in this data set.) c. Percentage of employees d. Can’t tell without more information 2

16. Suppose the standard deviation of a data set is 5. If you add 10 to every number in the data set, the new standard deviation will still be 5. TRUE (adding 5 to every number just moves the distribution of values to the right 5 units, the spread (variability) is not affected. For example if you have the data set 1, 2, 3, and add 5 to each number, you get 6, 7, 8. Standard deviation remains at 1.) 17. A survey of 100 randomly selected students was found to have 40 males and 60 females. Of the 40 males, 30 of them stayed up late the night before an exam (75%). Of the 60 females, 50 of them stayed up late (83%). However, since you didn’t use a stratified random sample to collect this data, the results are biased toward females. a. TRUE b. FALSE There is no bias because we have a random sample. And just sampling more females than males by chance does not make a sample biased towards females. Stratified samples do help out in situations where a group you want to compare to is so small that you might not get enough of them using a simple random sample (like a certain political affiliation that is not Democrat, Republican, or Independent.) More important however, is that stratified samples are used for making comparisons, but they are not required in order to make comparisons. 18. If a residual is negative, the actual data value for Y was BELOW the regression line. a. TRUE. If the residual (observed – expected) is negative, the observed data is smaller than what was predicted by the line, so the Y value is below the line. 19. A double-blind study is one where the subjects do not know what treatment they are getting, and they don’t know all the possible outcomes that could occur as a result of that treatment. a. TRUE b. FALSE (see notes for 3.2) 20. The following boxplots show that data set #1 has more data than data set #2. a. TRUE b. FALSE (boxplots do not show sample size) 14 12 10 8 6 4 2 0 data set 1 data set 2 Boxplot of data set 1, data set 2 3
21. Which Data Set from the histograms below has the highest standard deviation? Data set 1 polarizes the data as much as possible, making the average distance from the mean (4) as large as possible. Data set 2 has more values closer to the mean (4) and will result in a smaller overall SD. (See 1.1 notes) 7 6 5 4 3 2 1 10 8 6 4 2 0 X Frequency Histogram of Data Set # 1 7 6 5 4 3 2 1 4 3 2 1 0 X Frequency Histogram of Data Set # 2 22. A large data set from a simple random sample contains less bias than a small data set from a simple random sample. (Assume data collection quality is same for both.) a. TRUE b. FALSE (Random samples are NOT BIASED, no matter what size they are. A larger random sample has more precise/consistent results than a small sample does, but that is not related to bias. A biased sample is one that was collected on the basis of systematic favoritism of a group of individuals in the population . That does not happen with random samples. (See 3.1 notes))

