Unformatted text preview: Stats 350 Winter 2010 Exam 2 Solutions 1. Cyberbullying – involves the use of communication technologies to support deliberate, repeated, and hostile behavior by an individual or group, that is intended to harm others. A recent study of 200 high school males and 400 high school females resulted in 20% of the males and 26% of the females reporting they have been victims of cyberbullying. Let p1 = the population proportion of all high school males that have been victims of cyberbullying and p2 = the population proportion of all high school females that have been victims of cyberbullying. Assume these study results come from independent random samples. You are asked to use these results to test the theory that high school females are cyberbullied more than high school males, at the 5% level. [11 points] a. State the appropriate hypotheses in terms of p1 and p2. H0: ____ p1 = p2 ____________ Ha:_______ p1 < p2 _____________ b. How many of the high school females in the study reported they have been victims of cyberbullying? Show your work. 26% of 400 = 104 Final answer: _____104_________ c. Assuming there is no difference in cyberbullying rates for our two populations, estimate the overall (common) cyberbullying rate for high school students. Show your work. ˆ p = (200)(0.20) + (400)(0.26)/600 = 0.24 Final answer: ______ 0.24_________ d. Suppose the resulting test statistic is z = ‐1.6 and the corresponding p‐value is 0.055. Consider the following statements and determine which are true regarding this study. i. The results are statistically significant at the 5% level. True False ii. At the 5% level, the cyberbullying rate for high school females does not appear to be higher than that for high school males. True True False False iii. The probability that the null hypothesis is true is estimated to be 5.5%. iv. If this study were repeated many times, there is a 5.5% chance that the cyberbullying rate for high school females is higher than that for high school males. True False v. The cyberbullying rate for the sampled high school males was 1.6 (null) standard errors below the cyberbullying rate for the sampled high school females. True False vi. One condition for this test to be valid is that the models for the response for each population is normal. Page 1 True False Stats 350 Winter 2010 Exam 2 Solutions 2. Air Conditioners – An apartment complex ordered a large number of air conditioning units for some of their buildings. After installation they received several complaints that the air conditioners didn’t seem to be working properly. They decided to run a test to see if the units significantly changed the temperature in an apartment’s living room, on average. They randomly chose 40 of the apartments and recorded the temperature in the center of the room, turned on the AC unit and came back a few hours later and recorded the temperature again. They would like to estimate the population average change in temperature using a 99% confidence interval. [9 points] a. Provide the properstatistical notation for the parameter that will be estimated and the corresponding statistic that will be computed to estimate that parameter. Population Parameter: __ md __ Sample Statistic: __ __ b. Using the provided data summaries below, compute the 99% confidence interval for the population average change in temperature. Show your work. VARIABLE Temp_before Temp_after Temp_before – Temp_after c. MEAN 76.00 75.15 0.850 STD. DEVIATION 3.218 3.270 2.338 STD. ERROR OF MEAN 0.509 0.517 0.369 0.850 ± 2.75(0.369) 0.85 ± 1.015 Final Answer: ( ___-0.165 ___ , ___1.865 ____ ) Based on the confidence interval does it appear that the air conditioners are working on average at the 1% level? Circle: Yes No Briefly explain: The value of 0 is in the confidence interval. d. The apartment complex manager also created three QQ plots using the collected data which are shown at the right. Determine which of the following statements are true? Circle all that are correct. i. The normality assumption doesn’t appear to be valid, however it’s not of concern because the sample size is large enough to rely on the CLT. ii. The only QQ plot needed to check the normality assumption is the first one. iii. The QQ plots needed to check the normality assumption are the second and third. iv. Because the QQ plot of the before temperature shows an outlier this confidence interval is not valid. v. The QQ plot of the differences is clearly not normal so a new sample of data should be collected. Page 2 Stats 350 Winter 2010 Exam 2 Solutions 3. Grading Papers – Suppose Jane is curious about the factors that influence paper grading by professors. She has a hunch that papers that are typed are rated higher than papers that are handwritten on average. To test her hunch she conducted a study. A total of 43 freshman students are currently taking an English class and 20 of the students wrote their latest paper by hand, while the other 23 students typed their latest paper. All submitted papers were graded by the professor who teaches the English class. Jane obtained the scores from the professor and whether the paper had been written by hand or typed. She entered the data into SPSS and the following output was obtained. A 5% significance level will be used and you can assume the conditions for an independent two samples t‐test are met. [17 points] Randomized Experiment a. What type of study did Jane conduct? Circle one: Observational Study b. Circle all that correctly describe the Score variable being measured in this study: Quantitative variable Categorical variable Explanatory variable Response variable c. The null hypothesis has been given below. Provide the alternative hypothesis to test whether papers that are typed have a higher average rating than papers that are handwritten. Then select one of the parameters and provide a clear definition for that parameter. H0: µ µ versus Ha: ___ ___ The parameter_ _ = _population mean rating (or score) for all such handwritten papers ______ __ (and _ = _population mean rating (or score) for all such typed papers)__ d. Jane needs to decide whether to use a pooled t‐test or an unpooled t‐test depending on if an additional assumption about the two populations is reasonable for these data. Clearly state the additional assumption using symbols. Final answer ____ or Page 3 _______ Stats 350 Winter 2010 Exam 2 Solutions Problem 3 continued: e. Based on the output, provide two reasons why she should report the pooled t‐test result (instead of the general t‐ test result). Be specific, that is, include numerical values in each of your explanation. (1) Report pooled results because… The two sample standard deviations of 5.24 and 5.92 are very similar. (2) Report pooled results because… The p‐value for Levene’s test is 0.672, larger than any reasonable significance level (say 10%), so the hypothesis of equal population variances cannot be rejected. f. Give an estimate of the common standard deviation of the scores in the two populations. Show your work. . . √ . =5.6151 Final answer: ____5.6151_____ g. Report the test statistic and p‐value for testing the hypotheses in (c), and circle the correct decision. Test statistic = ___‐2.33_____________ p‐value = _______ 0.025/2 = 0.0125 ________ Decision based on this p‐value is (circle one): Reject H0 Fail to Reject H0 h. Which of the following is the correct conclusion from a student prospective? Circle one. • They would want to type their papers, because the study supports that typed papers get higher grades than handwritten papers on average. They would want to write their papers by hand, because the study supports that handwritten papers get higher grades than typed papers on average. It doesn’t matter whether they type or not, because this study doesn’t support that typed papers get higher grades than handwritten papers on average. It doesn’t matter whether they type or not, because this study supports that typed papers and handwritten papers get equal grades on average. • • • Page 4 Stats 350 Winter 2010 Exam 2 Solutions 4. IQ Scores – Suppose you are interested in estimating the difference between the average IQ scores for all female college students and that for all male college students. Let µ = the population average IQ for all female college students and µ = the population average IQ for all male college students. You took a random sample of 100 college students of which 50 were female students and 50 were male students, and you collected the IQ scores for these sampled students. Using the general (unpooled) procedure, a 95% confidence interval for the difference in µ , was found to be (‐1, 11). [7 points] population average IQ scores, µ a. Consider the following statements and determine which are true regarding the above study and results. • As there are more than 30 students in both samples, based on the central limit theorem, you may assume that the IQ scores in both populations (of all female college students and of all male college students) have approximately normal distributions. True False • The average IQ score for the sampled female students was 5 points higher than the average IQ score for the sampled male students. True False µ • With 95% confidence, we would estimate the difference in population average IQ scores µ to be somewhere between ‐1 to 11. True False • If a pooled procedure were used, you would get a narrower interval as the t* value would be smaller for the pooled version. True False b. At a 1% significance level, what would be your conclusion for testing if there is a significant difference between the µ versus Ha: µ µ )? average IQ scores for female college students and that for male college students (H0: µ Circle one: Reject H0 Fail to reject H0 Can’t tell Briefly explain why: Because the 95% CI has 0 in it AND the 99% CI would be wider and still have 0 in it. 5. Sample size needed – A market research firm is planning a final survey before deciding if a new product will actual go in to production. What sample size should they use if they wish to report a 99% conservative confidence interval with a margin of error of 2%? Show all work. [3 points] Final answer: ___ (at least) 4148 ___ Page 5 = . . =4147.36 Stats 350 Winter 2010 Exam 2 Solutions 6. Bakery Business – A bakery in Chicago is trying to determine the number of staff to have on hand on a typical weekday and weekend morning. They are interested in whether they should have more staff on a weekend morning, on a weekday morning, or if both are equally busy in terms of # Customers # Customers Week on Wed. on Sat. number of customers. A total of 16 weeks over the next few months is Jan 3‐9 28.00 25.00 randomly selected. They record the number of customers on the Jan 17‐23 25.00 18.00 Wednesday morning and on the Saturday morning for each week. A Jan 24‐30 26.00 22.00 … … … partial listing of the data is provided at the right. The manager is not sure Mar 21‐27 32.00 29.00 which test to perform, so he has provided the SPSS output for two different procedures. [12 points] Paired Samples Test Paired Differences Mean Pair 1 Weekday ‐ Weekend 3.750 Std. Deviation 3.47371 Std. Error Mean .86843 95% Confidence Interval of the Difference Lower 1.89899 Upper 5.60101 t 4.318 df 15 Sig. (2‐tailed) .001 Independent Samples Test Levene's Test for Equality of Variances 1=Weekday 2=Weekend F Customers Equal variances assumed Equal variances not assumed t‐test for Equality of Means 95% Confidence Interval of the Difference Lower .90919 .89006 Upper 6.59081 6.60994 Sig. .122 T 2.696 2.696 df 30 25.872 Sig. (2‐tailed) .011 .012 Mean Difference 3.750 3.750 Std. Error Difference 1.39101 1.39101 2.531 a. Based on how the data were collected, which is the appropriate procedure for assessing if the number of bakery customers differs on weekend mornings versus weekday mornings, on average? Circle your answer: Paired Samples Test Independent Samples Test b. State the hypothesis using the appropriate notation to assess if the number of bakery customers differs on weekday mornings versus weekend mornings, on average. H0: ____ md = 0 _____ Ha:____ md ≠ 0 _____ c. If there was no difference in the number of customers on weekday mornings as compared to weekend mornings on average, what would you expect for the value of your test statistic? Final answer: _____ 0______ d. Assuming the required assumptions to perform the test are met, use the appropriate output above to provide the value of the test statistic and corresponding p‐value. Test Statistic:_____ 4.318_________________ p‐value: ____0.001____________ e. At the 5% significance level, circle the decision and the phrase to complete the conclusion. Decision (circle one): Reject H0 Fail to Reject H0 THERE IS THERE IS NOT sufficient evidence to say that the average number of Thus, (circle one) bakery customers on the weekend mornings differs significantly from that for the weekdays mornings. Page 6 Stats 350 Winter 2010 Exam 2 Solutions 7. Big John’s Ropes – A rope company has been in production for almost 75 years, and they have kept extensive records on the quality of their various ropes. A particular type of half‐inch rope is known to have a true mean breaking strength of 863 pounds with a standard deviation of 9 pounds However, the distribution for breaking strength is positively skewed (or skewed to the right). A summer intern is unaware of this information regarding the distribution of breaking strength. He will take a random sample of 64 such pieces of ropes and determine the breaking strength of each in order to compute the sample mean. [4 points] a. What is the probability that the summer intern’s sample mean will be greater than 860 lbs? Show your work. P( ≥ 860) = P(Z ≥ √ ) = P(Z ≥ ‐2.67) = 1 – 0.0038 = 0.9962 Final answer: ____ 0.9962_____ b. Recall that the original distribution for breaking strength is positively skewed. Give the name of the result that allowed you to still answer part (a). Name of the Result: ________ The Central Limit Theorem (or CLT) _________________ 8. More Ropes (Part 2) – The supervisor has the summer intern conduct another study to estimate the breaking strength of a new type of quarter‐inch rope in production. A random sample of 16 pieces of this new rope is selected and the resulting mean breaking strength is 542 pounds with a standard deviation of 10 pounds. [4 points] a. The intern wrote the following interpretation about the standard error of the mean (or SEM) of 2.5 pounds, however this statement is not quite correct. Insert the missing words to make this a correct interpretation. We would estimate the average distance of the possible sample mean values ____from the population mean (from m) ________ to be roughly 2.5 pounds. Note: The possible sample mean values result from considering all possible random samples of 16 rope pieces from the same new rope population. b. These study results will be used to compute a 90% confidence interval for the population mean breaking strength. One of the assumptions for this interval to be valid is about normality. Clearly state what is assumed to be normally distributed. The population of responses (breaking strengths for all such new rope pieces) is assumed to be normally distributed. Page 7 Stats 350 Winter 2010 Exam 2 Solutions 9. Wrong hypotheses – A local teacher would like to test the theory that more than half of all adults in the community are satisfied with the quality of K – 12 education. She sets up the hypotheses to be tested as follows: H0: ̂ 0.5 versus Ha: ̂ 0.5 Explain in one sentence what is wrong with her hypotheses. [2 points] The hypothesis must be about a population parameter not a statistic. The hypothesis should be about p and not about . 10. Possible Studies about Changing Major – Ana, Bob, Carlos, and Dion all attend the same college. Each will be testing whether the proportion of all students at their college who have changed their major is 0.70, as claimed by the advising office (versus the rate is not 0.70). [6 points] a. Ana will take a random sample of 200 students and use a significance level of 0.05. Bob will take a random sample of 200 students and use a significance level of 0.01. Suppose the advising office’s claim is actually right. i. Is it possible for Ana or Bob to make a Type 1 error? Circle one: Yes No If Yes, who is more likely to do so? ____ Ana ________ ii. Is it possible for Ana or Bob to make a Type 2 error? Circle one: Yes No If Yes, who is more likely to do so? __________________ b. Carlos will take a random sample of 200 students and use a significance level of 0.05. Dion will take a random sample of 500 students and use a significance level of 0.05. Suppose the advising office’s claim is actually wrong. i. Is it possible for Carlos or Dion to make a Type 1 error? Circle one: Yes No If Yes, who is more likely to do so? __________________ ii. Is it possible for Carlos or Dion to make a Type 2 error? Circle one: If Yes, who is more likely to do so? _____ Carlos ________ Yes No Page 8 ...
View Full Document
- Fall '10
- Statistics, high school, high school females