1. Which of the following statistical measures is affected by outliers in the data? (a) the standard deviation. (b) the mean. (c) the correlation coefficient. (d) the median. (e) all of the above. (f) only (a) (b) and (c). Explain why you did, or did not, pick (a). Explain why you did, or did not, pick (b). Explain why you did, or did not, pick (c). Explain why you did, or did not, pick (d). 2 2. (10 points) Suppose a social scientist, investigating death certificates, records the age at which fifteen people died. The ages are displayed in the following stemplot: leaf unit = 1 N = 15 (a) Describe the shape [mode, skewness/symmetry, and outliers] of the distribution. (b) Find the median and the mean. Show your calculations to derive the mean. 3. (10 points) The histogram and some descriptive statistics for a dataset are shown: Descriptive Statistics: C1 N Mean Standard Deviation Minimum Q1 Median Q3 Maximum 520 1.2975 1.6834 0.00189 0.3187 0.7326 1.5733 8.0000 (a) Report (or compute as necessary) the value of the most appropriate measure of center, and the value of the most appropriate measure of spread. If you compute, show your calculations. (b) There are some data values that are clear outliers in the plot. If these outlying values were removed, the effect on mean and standard deviation would be which one of the following: (i) Mean and standard deviation would both increase. (ii) Mean would increase, but standard deviation would decrease. (iii) Mean and standard deviation would both decrease. (iv) Mean would decrease, but standard deviation would increase. (v) Neither mean nor standard deviation would be affected. Explain why you did, or did not, pick (i). Explain why you did, or did not, pick (ii). Explain why you did, or did not, pick (iii). Explain why you did, or did not, pick (iv). Explain why you did, or did not, pick (v). 4 4. (10 points) The boxplot below displays the delivery time for 1,000 deliveries from a certain pizza shop. (a) 50% of the delivery times were (choose the correct answer and EXPLAIN why you DID pick that answer and DID NOT pick the other possible answers, for a total of 5 explanations): (i) Between 5 and 50 minutes. (ii) Greater than 50 minutes. (iii) Between 10 and 25 minutes. (iv) Below 10 minutes. (v) Greater than 15 minutes (b) 25% of the delivery times were: (choose the BEST answer and EXPLAIN why you DID pick that answer and DID NOT pick the other possible answers, for a total of 5 explanations): (i) Below 5 minutes. (ii) Between 15 and 30 minutes (iii) Between 30 and 45 minutes (iv) Greater than 50 minutes. (v) Grater than 35 minutes. 5 [# 4 continued] (c) Based on the plot if THERE WERE an outlier at 60, the mean delivery time of the 1,000 deliveries is most likely (choose the correct answer): (i) 15 minutes. (ii) Greater than 15 minutes. (iii) Less than 15 minutes. (iv) 15/1000 minutes. Explain why you chose the answer you picked. 5. (5 points) Suppose that the distribution of lengths of time for connection between a student’s dorm computer and the remotely-located University server is normal in shape, with mean of 5 seconds and standard deviation of 1.2 seconds. The middle 95% of all connections will occur between what times? 6 6. (10 points) What are the effects of exposure to an advertising message? The answer may depend both on the length of the ad, and how often it is repeated. A study investigated this question using 80 undergraduate students as subjects. In order make sure that the sample is balanced in terms of gender, the researchers randomly selected 40 male students, and randomly selected 40 female students. All the students saw a 40 minute television program that included ads for a digital camera. The length of the commercial was either 30-seconds or 90- seconds, and it was repeated either 1,3, or 5 times during the program. The subjects were randomized to the different treatments, and at the end of the show rated their intention of purchasing the camera on a scale from 1 to 10. (a) This study is: (pick one) controlled experiment / observational study (b) What are the explanatory and response variables? (c) How many treatments are there altogether in this study? (d) Can you draw causal conclusions from this study? If not, explain why. If yes, what feature of this study allows you to do so? (one sentence is enough!!) (e) The sampling method that was used in this study is: (indicate the correct answer) Simple random sampling / stratified sampling / convenience sampling / voluntary response / cluster sampling. Explain why you picked the answer you picked. 7 7. (10 points) A random sample of 26 people was selected. Each person was asked about their daily consumption of olive oil, and their cholesterol value. The results are represented in the following plot: (a) For the plot shown, choose the most reasonable correlation coefficient, r is (pick one): (i) -2.1 (ii) -0.8 (iii) -0.03 (iv) 0 (v) 0.5 (vi) -1 Explain why you chose the answer you picked. 8 [#7 continued] Suppose the equation of the least-squares regression line for predicting cholesterol point value from grams of oil consumed is: cholesterol = 202 – 4.5 oil (b) Complete the following: For each extra gram of olive oil consumed, cholesterol value is predicted to increase/ decrease (pick one), by _____________ points (fill in the blank). (c) What is the predicted cholesterol level for a person who consumes 3 grams of olive oil daily? (d) True or false: We can conclude from this study that consuming olive oil causes changes in cholesterol level. Briefly explain. 9 8. (10 points) The question whether juvenile delinquency is related to birth order was examined in a large study. A total of 1,060 boys attending public school were given a questionnaire that measures delinquent behavior, and had also each boy indicate his birth order. The results are summarized in the following two-way table: Delinquent Not Delinquent Oldest 77 56 In-Between 59 37 Youngest 52 53 334 (a) What is the explanatory variable, and what is the response variable? (b) Based on your answer to (a), use the empty table below to supplement the two-way table with the appropriate conditional percentages. Delinquent Not Delinquent Oldest In-Between Youngest (c) Interpret your results in the context of the question. That is, based on part (b) write a brief description of the relationship between birth order and juvenile delinquency as it appears from the data. (d) Complete the following sentence: Since this is _________________________ (choose: a randomized experiment or an observational study), we ______________________ (choose: can or cannot) conclude that birth order is the cause for delinquent behavior. 10 9. (5 points). When conducting a survey, it is important to use a random sample in order to: (a) get a sample that represents the population well. (b) reduce bias resulting from poorly worded questions. (c) reduce bias resulting from poorly ordered questions. (d) reduce bias resulting from sensitive questions. (e) None of the above. Explain why you chose the answer you picked. 10. (10 points) Consider the following types of displays: (1) histogram (2) pie-chart (3) scatterplot (4) two-way table (5) side-by-side boxplots In each of the following situations data is recorded. Indicate which of the five choices above is the most appropriate display for the data and why it is the most appropriate. (a) A social scientist studying racial bias in the court system, records the race and guilty verdict for 300 people on trial. [At least some people were found guilty and some people not guilty.] (b) A health scientist investigates how well we can predict an athlete’s maximum bench press weight (a measure of his/her strength) from knowing the number of 60-pound bench presses that he/she can perform (before fatigue). The relevant data was collected from 125 athletes. (c) A USA Today poll asked a sample of single men (ages 18-44) the following question: If I had an “X-rated” bachelor party, I’d… The possible answers were: (i) tell fiancé all (ii) edit details (iii) say nothing. Data was recorded. (d) Does cell phone use while driving impairs reaction times? A recent experiment compared the reaction times (in milliseconds) of drivers who were engaged in a conversation on a cell phone to drivers who were not.