Unformatted Document Excerpt
Coursehero >>
North Carolina >>
N.C. State >>
ST 515
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Nail ST370 Spring 2008 Final Exam 1.
Name_______________________________
____ A) ____ B) ____ C) ____ D)
Jason is buying a new computer. He is comparing various models using many different variables. These variables play an important role in making the decision about which computer he will buy. Identify each variable as categorical (C), quantitative and discrete (QD), or quantitative and continuous (QC) by putting the appropriate letter(s) in the blank. Does the computer come with a CD/DVD writer? What is the memory capacity of the computer? Is the computer a laptop or a desktop model? How much does the computer cost?
Use the following to answer questions 2 and 3: The histogram below represents the height (in inches) of the gold medalwinning high jumps for the Olympic Games up to Sydney 2000.
50
40
30
20
y c n u q e r F
10
0 70 75 80 85 90 95
Winning jumps (in inches)
_____ 2. A) B)
What is approximately the mean height? 75 inches 77.5 inches C) D) 82 inches 90 inches
1
_____ 3. A) B)
What is approximately the percentage of these winning jumps that were at least 7'1" high (85 inches)? 9% C) 23% 14% D) 35%
Use the following to answer questions 4 and 5: During the early part of the 1994 baseball season, many sports fans and baseball players noticed that the number of home runs being hit seemed to be unusually large. Below are separate stemplots for the number of home runs by American League and National League teams based on the team-by-team statistics on home runs hit through Friday, June 3, 1994 (from the Columbus Dispatch sports section, Sunday, June 5, 1994). American League
2 3 4 5 6 7 5 0 1 4 5
National League
2 3 4 5 6 7 9 1 2 6 7 8 8 3 5 5 5 3 3 7
3 9 4 7 8 8 8 8 7
_____ 4. A) B) 5. _____ A) _____ B) _____ C) _____ D) _____ 6. A) B) C) D)
What is the median for the number of home runs for the American League teams? 45 50 C) D) 50.5 57.5
Determine whether each of the following statements is true or false. (Put a T or F in the blank.) The American League plot is reasonably symmetric. The National League plot is bimodal. The median number of home runs hit by National League teams for this time period was higher than the median for the American League teams. The lowest number of home runs hit by any team for this time period is 29. The average salary of all female workers is $35,000. The average salary of all male workers is $41,000. What must be true about the average salary of all workers? It must be $38,000. It must be larger than the median salary. It could be any number between $35,000 and $41,000. It must be larger than $38,000.
2
_____ 7.
For the density curve below, which of the following is true?
3 3 Density 2 2 1 1
0.0 0.0
0.2 0.25
0.5 0.5 X
0.8 0.75
1.0 1.0
X
A) B) C) D) _____ 8. A) B) C) D) E) F) _____ 9. A) B) C) D) E) F)
The mean and median are equal. The mean is greater than the median. The mean is less than the median. The mean could be either greater than or less than the median. Which of the following are reasons why we randomize in an experiment? List all that apply. To allow us to quantify experimental error. To reduce experimental error. To prevent a controlled variable from affecting the response. To keep the effects of a controlled variable from affecting conclusions about the treatment effects. To provide protection against a systematic effect of extraneous/lurking variables from affecting conclusions about treatment effects. To make it possible to generalize to a larger population. Which of the following are reasons why we use replication in an experiment? List all that apply. To allow us to quantify experimental error. To reduce experimental error. To prevent a controlled variable from affecting the response. To keep the effects of a controlled variable from affecting conclusions about the treatment effects. To provide protection against a systematic effect of extraneous/lurking variables from affecting conclusions about treatment effects. To make it possible to generalize to a larger population.
3
_____ 10. A) B) C) D) E) F)
Which of the following are reasons why we sometimes control a variable in an experiment? List all that apply. To allow us to quantify experimental error. To reduce experimental error. To prevent a controlled variable from affecting the response. To keep the effects of a controlled variable from affecting conclusions about the treatment effects. To provide protection against a systematic effect of extraneous/lurking variables from affecting conclusions about treatment effects. To make it possible to generalize to a larger population.
Use the following to answer questions 11-18: An automotive engineer wanted to determine the effects of engine (110 horsepower or 150 horsepower), suspension system (standard or tight), and passenger loading (200 pounds or 500 pounds) on gas mileage. Two test cars were driven under each treatment. Bob drove one car for each treatment, and Joe drove one car for each treatment. All cars received oil changes on the same schedule. Also, all of the tires on all of the cars were of the same brand, had the same amount of aging prior to the experiment, and were filled to the same pressure. Use the following answer choices to fill in the blanks for 11-16 below. You may use a choice more than once. A) B) C) D) E) F) _____ 11. _____ 12. _____ 13. _____ 14. Gas mileage Engine 110 horsepower 150 horsepower Suspension system Standard G) H) I) J) K) L) Tight Passenger loading 200 pounds 500 pounds Cars Driver M) N) O) P) Q) R) Bob Joe Oil change schedule Brand of tire Tire age Tire pressure
What is (are) the factor(s)? Mark all that apply. What is (are) the explanatory variable(s)? Mark all that apply. What is (are) the response variable(s)? Mark all that apply. What is (are) the controlled variable(s)? Mark all that apply.
4
_____ 15. _____ 16. _____ 17. _____ 18.
What is (are) the blocking variable(s)? Mark all that apply. What is (are) the experimental unit(s)? Mark all that apply. How many treatments? How many replicates per treatment?
Use the following to answer questions 19-24: The Department of Animal Regulations released information on pet ownership for the population consisting of all households in a particular county. Let the random variable X = the number of licensed dogs per household. The distribution for the random variable X is given below. Value of X 0 Probability 0.52 _____ 19. A) B) _____ 20. A) B) _____ 21. A) B) 1 0.22 2 0.13 3 4 0.03 5 0.01
The probability for X = 3 is missing. What is it? 0.07 0.09 C) D) 0.1 0.0
What is the probability that a randomly selected household from this community owns at least one licensed dog? 0.22 C) 0.48 0.26 D) 0.52 What is the probability that a randomly selected household from this community owns exactly one licensed dog? 0.22 C) 0.48 0.26 D) 0.52
5
_____ 22. A) B) _____ 23. A) B) _____ 24. A) B) 25.
What is the probability that a randomly selected household from this community owns at most two licensed dogs? 0.13 C) 0.87 0.74 D) 0.26 What is the mean or expected number of licensed dogs per household in this county? 0.92 dogs 1 dog C) D) 1.22 dogs 3 dogs
What is the variance of the number of dogs per household in this county? .9935 dogs squared 0.9967 dogs squared C) D) 1.87 dogs squared 3.5 dogs squared
For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable X. Put a Y for Yes or an N for No in the blank. _____ A) X = the number of phone calls received in a one-hour period. _____ B) A hand of 5 cards will be dealt from a standard deck of 52 cards that has been thoroughly shuffled. Let X = the number of hearts in the hand of 5 cards. Use the following to answer questions 26 through 29: A set of ten cards consists of three red cards and seven black cards. The cards are shuffled thoroughly. One card is to be selected at random. The color will be observed and the card replaced in the set. The cards are then thoroughly reshuffled. This selection procedure is repeated four times. Let X = the number of red cards observed in these four trials. _____ 26. A) B) _____ 27. A) B) What is the probability that all card is red? .3 .7 C) D) .4116 .6517
What is the probability that at most 3 cards are red? .0081 .75 C) D) .9 .9919
6
_____ 28. A) B) _____ 29. A) B)
If this selection of four cards is repeated many, many times, what is the mean number of red cards in all these trials? .84 C) 2 1.2 D) 2.5 If this selection of four cards is repeated many, many times, what is the variance of the number of red cards in all these trials? .84 C) 2 1.2 D) 2.5
Use the following to answer questions 30-33: The probability density of a continuous random variable X is given in the figure below.
0 _____ 30. A) B) _____ 31. A) B)
1
2
X
Based on this density, what is the probability that X is between 0.5 and 1.5? What is the P(X = 1.5)? 0 C) D) C) D) 1
7
_____ 32. A) B) _____ 33. A) B)
What is the P(X > 1.5)? 0 What is the P(X 1.5)? 0 C) D) C) D)
Consider the following probability density function: f ( x) = _____ 34. A) B) ok ( x - .5) 2 0 0 x 1 otherwise
What value must the constant k take so that f is a valid probability density function? 6 12 C) D) 24 36
Use the following to answer questions 35-37: Consider a continuous random variable X which has the following probability density function (p.d.f.): 0 x<0 f ( x) = 3 x 2 0 x 1 0 x >1 _____ 35. A) B) What is the probability that X falls between .25 and .5? 0 .0439 C) D) .1094 .5625
8
_____ 36. A) B)
Which of the following calculations would be used in deriving an expression for the Cumulative Distribution Function (C.D.F.)?
1
F (a ) = 3 x 2 dx
0 1
C) D)
1
F (a ) = x 2 3 x 2 dx
0 a
F (a ) = x3x 2 dx
0
F (a ) = 3 x 2 dx
0
_____ 37. A) B)
Which of the following calculations would be used to find EX, the expected value of X? 1 1 C) EX = x3 x 2 EX = 3 x 2 dx
x=0
0
EX = 3t 2 dt
0
x
D)
EX = x3 x 2 dx
0
1
_____ 38.
A) B)
Let X represent the temperature at any random location in a kiln used for manufacturing bricks. This temperature is normally distributed with a mean of 1000F and a standard deviation of 50F. If bricks are fired at a temperature above 1100F, they will crack and must be discarded, but if they are fired at a temperature below 925o, then they will not dry completely. Which of the following expressions DOES NOT give the probability that a randomly selected brick will be fired at a temperature between 925o and 1100? 2 1100 C) z2 ( x - 1000) 2 - - -1/ 2 -1/ 2 (2 dz (5000 dx ) exp 2 ) exp 5000 -1.5 925 1100 D) FX (1100) - FX (925) ( x - 1000) 2 - x(5000 ) -1/ 2 exp dx 5000 925
Use the following to answer questions 39-40: Chocolate bars produced by a certain machine are labeled 8.0 oz. The distribution of the actual weights of these chocolate bars is claimed to be normal with a mean of 8.1 oz. and a standard deviation of 0.1 oz. _____ 39. A) B) A quality control manager initially plans to take a simple random sample of size n from the production line. If he were to double his sample size (to 2n), by what factor would the standard deviation of the sampling distribution of X change? C) 1/2 2 D) 2 1 2
9
_____ 40. A) B) C) D)
The quality control manager plans to take a simple random sample of size n from the production line. How big should n be so that the sampling distribution of X has standard deviation 0.01 oz.? 10 100 1000 Cannot be determined unless we know the population follows a normal distribution.
Use the following to answer questions 41-42: Birth weights of babies born to full-term pregnancies follow roughly a normal distribution. At Meadowbrook Hospital, the mean weight of babies born to full-term pregnancies is 7 lbs. with a standard deviation of 14 oz. (1 lb. = 16 oz.). _____ 41. A) B) _____ 42. A) B) Dr. Watts (who works at Meadowbrook Hospital) has four deliveries (all for fullterm pregnancies) coming up during the night. Assume that the birth weights of these four babies can be viewed as a simple random sample. What is the probability that the average weight of the four babies will be more than 7.5 lbs? 0.0065 C) 0.2839 0.1265 D) 0.4858 Rachel is one of Dr. Watts' patients. Rachel had an ultrasound three weeks ago and the doctors established that the baby already weighed 7 lbs. at that point. She is about to deliver her baby (full term). What is the probability that the baby will be more 8 than lbs.? 0.0065 C) 0.2531 0.1265 D) 0.4858
Use the following to answer questions 43 and 44: Battery packs in radio-controlled racing cars need to be able to last pretty long. The distribution of the lifetimes of battery packs made by Lectric Co. is slightly left skewed. Assume that the standard deviation of the lifetime distribution is = 2.5 hours. A simple random sample of 75 battery packs results in a mean x = 29.6 hours.
10
_____ 43. A) B) C) D)
What is a 90% confidence interval for , the true average lifetime of the battery packs made by Lectric Co.? (29.13, 30.07) (29.03, 30.17) (28.86, 30.34) The confidence interval cannot be calculated, because the population distribution is not normal.
44.
Determine whether each of the following statements is true or false. Put a T or an F in the blank. _____ A) If a 95% confidence interval had been calculated, the margin of error would have been larger. _____ B) If many more samples of 75 battery packs were taken, 90% of the resulting confidence intervals would have a sample mean between 29.13 and 30.07. _____ C) If the sample size had been 150 and not 75, the margin of error would have been larger. _____ 45. Suppose we wish to calculate a 90% confidence interval for the average amount spent on books by freshmen in their first year at a major university. The interval is to have a margin of error of $2. Assume that the amount spent on books by freshmen has a normal distribution with a standard deviation of = $30. How many observations are required to achieve this margin of error? Use z * to three decimal places in your calculations. 865 C) 608 866 D) 609 The square footage of the several thousand apartments in a new development is advertised to be 1250 square feet, on average. A tenant group thinks that the apartments are smaller than advertised. They hire an engineer to measure a sample of apartments to test their suspicions. Let represent the true average area (in square feet) of these apartments. What are the appropriate null and alternative hypotheses? H0: = 1250 vs. Ha: < 1250 H0: = 1250 vs. Ha: 1250 H0: = 1250 vs. Ha: > 1250
A) B) _____ 46.
A) B) C)
Use the following to answer questions 47 and 48: The nicotine content in cigarettes of a certain brand is normally distributed with standard deviation = 0.1 milligrams. The brand advertises that the mean nicotine content of their cigarettes is =1.5, but measurements on a random sample of 100 cigarettes of this brand gave a mean of x = 1.53. Is this evidence that the mean nicotine content is actually higher than advertised? To answer this, test the hypotheses H0: = 1.5 versus Ha: > 1.5. 11
_____ 47. A) B) 48. _____ A) _____ B) _____ C) _____ D)
What is the value of the P-value? 0 0.0013 C) D) 0.3821 0.9987
Determine whether each of the following statements is true or false. The probability that H0 is true is 0.05. The data were statistically significant at = 0.05. At the 5% significance level, Ha should be rejected. Even if nicotine content in cigarettes of this brand were not quite normally distributed, the test would still be valid. In a test of statistical hypotheses, what does the P-value tell us? If the null hypothesis is true. If the alternative hypothesis is true. The largest level of significance at which the null hypothesis can be rejected. The smallest level of significance at which the null hypothesis can be rejected. A medical researcher is working on a new treatment for a certain type of cancer. The average survival time after diagnosis on the standard treatment is two years. In an early trial, she tries the new treatment on three subjects who have an average survival time after diagnosis of four years. Although the survival time has doubled, the results are not statistically significant even at the 0.10 significance level. What is the best explanation? The sample size is too small to determine if the observed increase cannot be reasonably attributed to chance. Although the survival time has doubled, the actual increase is only two years. The calculation was in error; an increase of 2 years is very significant. A sample of size n = 27 is used to conduct a significance test for H0: = 75 versus Ha: > 75. The test statistic is t = 3.45. What are the degrees of freedom for this test statistic? 26 27 74 75
_____ 49. A) B) C) D) _____ 50.
A) B) C) _____ 51. A) B) C) D)
12
_____ 52.
A) B) C) D) _____ 53.
A simple random sample of 20 third-grade children from a certain school district is selected, and each is given a test to measure his/her reading ability. We are interested in calculating a 95% confidence interval for the population mean score. In the sample, the mean score is 64 points, and the standard deviation s is 12 points. What is the margin of error associated with the confidence interval? 4.64 points 5.26 points 5.60 points 5.62 points Scores on the SAT Mathematics test are believed to be normally distributed. The scores of a simple random sample of five students who recently took the exam are 550, 620, 710, 520, and 480. The standard deviation of these numbers is s = 90.72.What is a 95% confidence interval for , the population mean score on the SAT Math test? (456.7, 695.3) (463.4, 688.6) (480.8, 671.2) (496.5, 655.5)
A) B) C) D)
Use the following to answer questions 54-55: We wish to see if the dial indicating the oven temperature for a certain model oven is properly calibrated. Four ovens of this model are selected at random. The dial on each is set to 300 F, and after one hour, the actual temperature of each is measured. The temperatures measured are 305, 310, 300, and 305. Assume that the distribution of the actual temperatures for this model when the dial is set to 300 is normal. To test if the dial is properly calibrated, we will test the following hypotheses: H0: = 300 versus Ha: 300. The sample standard deviation is s = 4.082. _____ 54. A) B) _____ 55. A) B) C) D) Based on the data, what is the value of the one-sample t statistic? 1.23 2.45 C) D) 4.90 5.0
Are the data statistically significant at the 5% significance level? Yes, because the P-value is less than 0.05. Yes, because the sample mean x = 305, which is much higher than 300. No, because a difference of 5 (between x and ) as compared to 300 is very small (insignificant). No, because the P-value is greater than 0.05.
13
56. Group 19 in my Fall 2007 ST370 class performed a computer experiment. The purpose of the experiment was to determine the effect of language (at levels PHP, Perl, Python, C, and Lisp) and processor (at levels Macbook 2x2.0 Ghz, Godfather 1x1.9 Ghz, and Tenniscores 4x2.4 Ghz) on time it takes to calculate . The following ANOVA table shows results of their analysis. Fill in the missing numbers in the ANOVA table. Keep 3 significant digits, and use scientific notation if necessary.
ANOVA Source of Variation Processor Language Interaction Error SS 17.5506 153.9238 a. _________ 21.18158 8 df 2 MS F P-value 3.61228E-50
8.77529968 d. ________ 38.48095722 2.64769723
5718.75897 e. ________ 393.481435 7.43337E-49
0.403734 b. _________ c. ________
Total
193.0597
74
57. The table below gives some of the treatment means yij for a 3 by 2 factorial experiment. Fill in the missing treatment means assuming that there is no interaction effect. (Hint: draw an interaction plot.) (2 points each) B Level 1 2.7 a. B Level 2 5.8 7.2 B Level 3 b. 1.3
A Level 1 A Level 2
Use the following to answer question 56: Group 6 in my Fall 2007 ST370 class performed an experiment that involved putting Mentos candies in bottles of diet coke, which causes an eruption of liquid from the bottle. The purpose of the experiment was to determine the effects of number of mentos (either two or four) and initial volume (591, 1000, or 2000 mL) on the percent volume lost, which is defined as (final volume-initial volume)/initial volume. The units of the response variable are mL. There are
14
several ANOVA tables describing their results below. Some may be necessary, and others not. Use them to determine the best model for the data.
ANOVA Source of Variation Initial volume Number of mentos Interaction Error Total SS 0.000977 0.002738 3.33E-07 0.014188 0.017904 df 2 1 2 12 17 MS F P-value 0.000489 0.413354 0.670496 0.002738 2.31576 0.153975 1.67E-07 0.000141 0.999859 0.001182
Source of Variation Initial volume Number of mentos Error Total
SS 0.000977 0.002738 0.014188 0.017904
df
MS F P-value 2 0.000489 0.482235 0.627288 1 0.002738 2.701656 0.122505 14 0.001013 17
Source of Variation Number of mentos Error Total
SS 0.002738 0.015166 0.017904
df
MS F P-value 1 0.002738 2.888609 0.108563 16 0.000948 17
Source of Variation Initial volume Error Total
SS 0.000977 0.016926 0.017904
df
MS F P-value 2 0.000489 0.433102 0.656352 15 0.001128 17
_____ 56. A) B) C) D)
Which of the following is the best model to describe the data for this experiment? Note: It doesn't matter which factor is called A and which is called B. yijk = + i + j + ij + eijk , eijk ~ N (0, ) yijk = + i + j + eijk , eijk ~ N (0, ) yijk = + i + eijk , eijk ~ N (0, ) yijk = + eijk , eijk ~ N (0, )
Use the following to answer question 57: Group 16 in my Fall 2007 ST370 class performed an experiment that involved putting pieces of bread in water. The purpose of the experiment was to determine the effect of bread type (whole
15
wheat, white, and honey wheat) and time (30 seconds or two minutes) on volume of water absorbed in mL. Below, find the table of treatment means and the table of interaction effects.
Average of Absorbency (mL) Bread Time Honey Wheat White Whole Wheat 0:30 51.16666667 63.66666667 56.16666667 57 -7.05556 a_1 2:00 67 79.66666667 66.66666667 71.11111111 7.055556 a_2 59.08333333 71.66666667 61.41666667 64.05555556 -4.972222222 7.611111111 -2.638888889 b_1 b_2 b_3 Interaction effects: ab_ij Honey Wheat White Whole Wheat 0:30 -0.861111111 -0.944444444 1.805555556 2:00 0.861111111 0.944444444 -7.055555556
57. The group found the additive model to be the most appropriate model for their data. Use the tables above to fill in the table of model predictions below.
Model predictions: Time 0:30 2:00 Honey Wheat Bread White Whole Wheat
Use the following to answer questions 58: The index of biotic integrity (IBI) is a measure of the water quality in streams. IBI and land-use measures for a collection of streams in the Ozark Highland ecoregion of Arkansas were collected as part of a study. The graph below shows a scatterplot of the IBI versus the area of the watershed (in square kilometers) for streams in the original sample with area less than or equal to 70 km2.
16
The tables below give SAS output for a linear regression analysis of this data.
The REG Procedure Model: MODEL1 Dependent Variable: IBI IBI Number of Observations Read Number of Observations Used 49 49
Analysis of Variance Sum of Mean Squares Square 3189.26973 3189.26973 12850 273.39461 48 16039
Source Model Error Corrected Total
DF 1 47
F Value 11.67
Pr > F 0.0013
Root MSE 16.53465 R-Square 0.1988 Dependent Mean 65.93878 Adj R-Sq 0.1818 Coeff Var 25.07576 Parameter Estimates Parameter Standard DF Estimate Error
Variable
Label
t Value
Pr > | t |
Intercept Intercept 1 52.92296 4.48352 11.80 <.0001 Area Area 1 0.46016 0.13473 3.42 0.0013
________58. What is the correlation between IBI and watershed area? ________59. Based on the least squares regression line, what would you predict the value of IBI to be if the watershed area were 43 km2? 17
________60. What is the residual for the point (19, 29)? ________61. If we were to increment watershed area by 10 square kilometers, what would be the predicted increment in IBI? ________62. What percent of the variability in IBI is explained by the least squares regression on watershed area? ____________________63. Suppose your boss told you to use this least squares regression line to predict the value of IBI for a watershed area of 1000 km2. This would be an example of what practice? Put your answer in the blank. _____ 64. A) B) _____ 65. A) B) Which of the following is the null hypothesis that we are testing with the pvalue in the last row of the SAS output above? C) H o : 0 = 0 H o : 0 = 0 and 1 = 0 D) H o : 1 = 0 H o : 1 0 What is the best model for the IBI and watershed area data? yi = 0 + 1 xi + ei , ei ~ N (0, ) C) D) yi = 0 + ei , ei ~ N (0, ) yi = + ei , ei ~ N (0, ) yi = 1 xi + ei , ei ~ N (0, )
18