M1 solns and REMARKS

# Assume data collection quality is same for both a

from a simple random sample. (Assume data collection quality is same for both.) a. TRUE b. FALSE (Random samples are NOT BIASED, no matter what size they are. A larger random sample has more precise/consistent results than a small sample does, but that is not related to bias. A biased sample is one that was collected on the basis of systematic favoritism of a group of individuals in the population . That does not happen with random samples. (See 3.1 notes)) 23. Which of the following is NOT in the same units as the original data? a. Standard deviation b. Q1 c. Y-intercept of the regression line d. All of the above are in the same units as the original data. 24. The residuals from a regression line are in the same units as the Y values. a. TRUE (residual = observed y – expected y from the line; both are in the same units.) 25. It is possible to have a data set where Q1 is equal to the median. a. TRUE b. FALSE Example: 1, 1, 1, 1, 1, 1, 1, 1. All numbers in the 5 numbers summary equal one. Example: 1, 1, 1, 1, 1, 5, 5, 5. Q1 = 1, Median = 1 Example: 0, 2, 2, 2, 2, 2, 3, 4. Q1 = 2, Median = 2 This occurs when all the data are the same between the value of Q1 and the Median. The rest of the data can be anything. 4

and Quiz 2. Each quiz has a total of 20 points. She notices that every single student scored half as many points on Quiz 2 as they did on Quiz 1. 26. What is the correlation between the scores on these two quizzes? a. -0.50 b. 0.50 c. 1.00 (Every single student scored half as many on quiz 2. For example, take 3 students’ quiz scores: (6, 3); (8, 4); (10, 5). These points, and any other classmates, form a perfect straight line. The line is Y = ½ X + 0. d. None of the above 27. The slope of the regression line for this data set is: a. Positive (look at the example of 3 students’ scores in the answer to 26.) b. Negative c. Zero d. Can’t tell without seeing the actual data 28. A sample of 350 American families was surveyed and asked to report the amount of money (in dollars) spent annually on fruits and vegetables. The histogram of the data is below. Which of the following is the closest number to Q1 here? a. \$17.5 per year b. \$225 per year c. \$350 per year (about 25% of the values in the histogram lie below 350.) d. \$25 per year Dollars Frequency 800 700 600 500 400 300 200 100 70 60 50 40 30 20 10 0 Amount of \$ spent annually on fruits and vegetables 29. A point that is an outlier in the Y (vertical) direction always has a large residual. a. TRUE(see Regression Part 2 notes) b. FALSE 30. You can always interpret the Y intercept of a regression line. b. FALSE (see Regression Part 1 lecture notes, last 3 slides. To interpret means to discuss it in the context of the problem. It does not mean just report what the number is . There are times when it is just not interpretable, such as when you predict internet use using years of education. The y intercept is negative in that case. It is not possible to interpret these results. You can’t discuss it in the context of education and internet use. 5
22. A random sample of 1000 people is less biased than a random sample of 100 people

