Unformatted text preview: PADP 8120 Fertig Spring 2011 UGA Homework 3 Solutions 1. The Current Population Survey of about 60,000 households in the U.S. in 1992 indicated that 10.3% of whites, 31.0% of blacks, and 26.7% of Hispanics in the United States have annual income below the poverty level (Statistical Abstract of the United States, 1994). a. Are these numbers statistics or parameters? Explain. Statistics because they are from a sample, not the population. b. Assume that we can conclude that the percentage of all black households in the United States having income below the poverty level is at least 30% but no greater than 32%. Is the statistical method we used to come to this conclusion descriptive or inferential? Explain. Inferential the method allows us to make statements about the population based on information from the sample. 2. When the Yankelovich polling organization asked, "Should laws be passed to eliminate all possibilities of special interests giving huge sums of money to candidates?" 80% of the sample answered yes. When they posted the question, "Should laws be passed to prohibit interest groups from contributing to campaigns, or do groups have a right to contribute to the candidate that they support?" only 40% said yes (Source: A Mathematician Reads the Newspaper, by J.A. Paulos, New York: Basic Books, 195, p. 15). Explain what sampling problem this example illustrates. Response bias the respondents' responses are influenced by how the question is asked. 3. Check the appropriate type of measurement for each of the following measured when measured as described. Unordered Ordered Binary Interval Categorical Categorical X X X X Occupation (plumber, teacher, secretary, etc.) Annual income (thousands of dollars per year) Socioeconomic status (low, medium, high) State murder rate (number of 1 murders per 1000 population) Community size (rural, small town, X large town, small city, large city) Party affiliation (Communist, Green, X Independent, Republican, Democrat) Age (number of years of age) X Gender (male, female) X 4. A company conducts a study of the number of miles traveled using public transportation by its employees during a typical day. A random sample of ten employees yields the following values (in miles): 0,0,4,0,0,0,10,0,6,0. a. Calculate and interpret the mean, median, mode, range, variance, and standard deviation of these measurements. Show your work. Mean = (4+10+6)/10 = 20/10=2 Median=Mode=0 Range 010 Variance=(8*4+64+16)/10=112/10=11.2 (12.4 if divide by 9) Data Deviation Squared Deviation 0 02=2 4 0 02=2 4 4 42=2 4 0 02=2 4 0 02=2 4 0 02=2 4 10 102=8 64 0 02=2 4 6 62=4 16 0 02=2 4 Standard deviation=sqrt(11.2)=3.35 (3.53 if divide by 9) The mean is greater than the median because the distribution is skewed such that there are a lot of 0s. The mode just tells us that there are a lot of 0s. The range does not tell us very much. The mean is 2 and the standard deviation is about 3.5, which tells us that if we picked an employee at random, there is a 2.5% chance that employee computes by public transportation more than 2+2*3.5=9 miles. b. The 11th person sampled lives in a different city and travels 90 miles a day on public transportation. Recompute the mean, median, and standard deviation including this observation, and note the effect of this outlier. Show your work. Mean = (4+10+6+90)/11 = 100/11=10 2 Median=0 Variance=(8*100+16+6400)/11=7216/11=656 (721.6 if divide by 10) Data Deviation Squared Deviation 0 010=10 100 0 010=10 100 4 410=6 100 0 010=10 100 0 010=10 100 0 010=10 100 10 1010=0 0 0 010=10 100 6 610=4 16 0 010=10 100 90 9010=80 6400 Standard deviation=sqrt(656)=25.6 (26.9 if divide by 10) The outlier increased the mean and the standard deviation but did not affect the median. 5. A study examining the age of graduate students at public universities takes a random sample of 100 graduate students from a typical public university. a. If the standard deviation of the ages of all graduate students at this university is =15, find the probability that the mean age of the graduate students sampled is within 2 years of the mean age for all graduate students at this university. (HINT: think about/use z scores). Z=(y)/=2/15=0.13 Table A on page 592 of A&F: The probability that the sample mean age is within 2 years of the population mean age is 10.4483*2=10.8966 or 10%. b. Would the probability be larger, or smaller, if =10? Show your work and explain the intuition. Z=(y)/=2/10=0.2 Table A on page 592 of A&F: The probability that the mean age is within 2 years of the population mean age is 10.4207*2=10.8414 or 16%. Thus, the probability is larger. This makes sense because as the distribution of the population becomes tighter, the sampling distribution becomes tighter, and tighter sampling distributions give us more accurate estimates of the population average. 3 6. Use Stata and day2.dta to find the following: a. The median age of the heads in the sample. 44 b. The variable with a skewed distribution (specifically, a very long right tail). Include a graph of the variable's distribution. faminc 0 Density 2.000e064.000e066.000e068.000e06 .00001 Kernel density estimate 0 1000000 2000000 3000000 TOTAL FAMILY INCOME2006 4000000 kernel = epanechnikov, bandwidth = 6.4e+03 c. The percent of the households receiving food stamps. 12.9% d. The mean and standard deviation of the head's education if the head is male and if the head is female. If head is male: Mean=13.11 Standard deviation=2.64 If head is female: Mean=12.62 Standard deviation=2.54 4 7. An observation is .50 standard deviations below the mean on a normally distributed variable. What proportion of the data fall below this observation? Above this observation? When the z=0.50, the proportion of the data below this observation is 0.3085 (from Table A); the proportion of the data above this observation is 1.3085=0.6915. 8. Recent General Social Surveys asked subjects, "How long have you lived in the city, town, or community where you live now?". The possible responses were (less than one year, 1, 2, 3, 4, ...). The responses of 1415 subjects had a mode of "less than one year," a median of 16 years, a mean of 20.3 years and a standard deviation of 18.2 years. a. Do you think that the population distribution is normal? Why or why not? No, the population is not normally distributed. Since the mean and median are different, we know that the distribution is not symmetric (and the normal distribution is symmetric). b. Based on your answer in (a), is it valid to construct a 99% confidence interval for the true mean for the population represented by this sample? If not, explain why not. If it is valid, do so and interpret. Yes, it is valid because with a sample size of 1415, the sampling distribution will be normal because of the central limit theorem. 99% confidence interval = estimate of mean (z*standard error) estimate of mean = 20.3 z with corresponding probability of 99% is 2.58 standard error of mean = s/sqrt(n) = 18.2/sqrt(1415)=0.484 99% confidence interval = 20.3 (2.58*0.484) = 20.3 1.25 The true population mean lies between 19.05 and 21.55 with 99% probability. 5 ...
View
Full Document
 Summer '11
 FERTIG
 Normal Distribution, Standard Deviation, Deviation

Click to edit the document details