This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Last name: First name: Student #: UNIVERSITY OF TORONTO
Faculty of Arts and Science April 2014 EXAMINATIONS STA 304H1 S/1003H S Duration  3 hours Examination Aids: Non—Programmable Calculator, aid sheet, eight onesided pages, or four two
sided pages, with theoretical formulas and definitions only, as posted on Portal. Q. 1) [20] You are interested in the coffee habits of undergraduate U of T students and want to estimate
the percentage of students who drink coffee regularly (and some other characteristics). You stay on
College Street at 12 PM (noon) on a week day in the front of Second Cup and interview every 10th
person passing you that might be a student in appearance (e. g., you don’t stop children and “older”
persons). You ask them the following questions: Q1: Are you an undergraduate U of T student? If no, thanks. If yes, Q2: Do you drink coffee regularly? If no, thanks. If yes, Q3: How many cups of coffee do you drink per day, on average? Q4: What kind of coffee do you drink most often (black, with milk, latte, etc)?
If the person you are interviewing is a U of T student (after Q1), you also record his/her gender (F, or
M) (we assume you don’t need to ask a question about it). You stop interviews once you collect a
desired number of responses from U of T students. (a) What is your (i) target .ulation, and (ii) sampling population? (b) (i) How would you descrﬁour sampling design? (i) Can it be considered as an SRS? Discuss.
(continued) Q 1/13 (c) What kind of sampling bias may you have in your design (nonrepresentativness due to your
sampling population)? Explain in short, in terms of this study (don’t use some unrelated geneQ
terminology). (d) What kind of nonresponse may you 'ct in your sampling and interviewing? Explain at least for two different cases.
(e) Do you have any criticism of the questionnaire? Give your objections in short to at least one of the questions.
(continued) 2/13 (f) (g) Assume all is nice with the sampling design. (i) What would be your sample size (intended number
of interviewed U of T students), if you wanlg stimate the percentage of students that drink
coffee regularly, with an error bound of 5%. am your experience, you have a reasonable guess
that this percentage is at least 70%. (ii) How would you choose your sample size, if you have no
idea what this percentage might be? Would it signiﬁc E: fect your sample size, compared with
(i)? (iii) Do you have any comment on practicality (or i . cticality) of the obtained sample sizes
(hopefully, you did it correcm Assume again all is nice wit  sampling design. You want to estimate the average number of
cups of coffee a student drinks per day (for students th A * n’t drink coffee at all zero cups is
counted). (i) W E ould you propose as the sample Sig you want to estimate this average with
an error bound . . .2 cups? (to properly answer the question, you have to make some reasonable
guess about the number of cups) (ii) How would this sample size affect your estimation intended in
(f)? Would it be better, or worse? Explain. lg 3/13 Q. 2) [24] In a Labor Force Population Survey (U.S., 1976), a city area is divided into small clusters of
four nearby households, and all persons in the labour force are registered, cluster by cluster. The total
number of the registered persons is 210. An SRS of 20 persons is selected, and information on the variables x, y, z, and w, is obtained, as follows:
—uunnn 15 16
—m
inlglnnnnnnunlsu
—m
WM
Mi
“IIml—
227 500 HoursPW= hours working per week, WklyWG= weekly wage, Sex: 0 — male, 1 ~ female,
inwi = 217919 222:“): = 249585 N
O (a) Estimate the average age of persons in the labour force in the area, and pl a bound on the error of
estimation. E, (b) Estimate the total number of females in the labout force, and place a bound on the error
estimation. Do you think the total number of males in the labor force is signiﬁcantly larghan of
females, using the sample results? (you don’t need to use a formal statistical test, but to reasonably
justify your conclusion) (0) If you want to conduct another survey to estimate only the average hours working per week per
person with the error bound of 6 hours, what would be your sample size? Use the information
provided by this sample. (continued) Q 4/13 (d) Estimate the average weekly wage per female worker and place a bound on the error of estimation.
What kind of the estimator are you using? Is it unbia.7 Explain. (e) Estimate the total amount of the weekly wages for all female in the labour fochzd place a bound
mﬂwamnﬁﬂmmmmtmmmwbmRummmme%ﬁmwhmh¢m%wwwymummonﬁﬁ
you don’t see the solution Clearly) (continued) 5/13 (t) Estimate the average wage per hour earned by the labour force in the city area and place a bound on
the error of estimation. (g) You want to estimate the average weekly wage per person in labour force and you can use
information given in the sample, which is either age, or hours working per week (you will use one of
them). Which of these two you think is more useful? Justify and use it to estimate the average
weekly wage per person. It is known that the average age in the labour force is 32 years, and the average hours working per week is 36 hours. Use an estimator you think will give you the best
estimate. 6/13 Q. 3) [16] A certain city is divided into three service areas, the North, Southeast and Southwest, with
125,000, 75,000 and 50,000 households respectively. In the most recent survey, an SRS from each area
was selected, and each selected household was interviewed. Among other variables, the number of years
living in the household (length of stay, y), and whether household uses a garage, were recorded. The
results are summarized in the followin table: Area Number of Sample Average Length Standard Using a
Households Size of stay, ?, Deviation, S Garage, —p,.
1—25 000 _81 =4% —300 10. 24 2.03 _2% (a) (i) Estimate the average length of stay for the city, and (ii) the standard deviation of the estimator.
(b) (i) Estimate the total number of households who use garages in the city, and (ii) place a bound on the error of estimation.
(continued) 7/13 (c) If the costs of sampling from each stratum were equal would the optimal allocation produce
signiﬁcantly better results than (i) an SRS, (ii) proportional allocation, for the same sample size? Explain, without using calculation.
(d) A new survey is planned. The costs of interviewing customers from the North, Southeast and Southwest are in proportions 2 : 3 : 4 respectively (differences mostly due to traveling). What is the
minimal cost of a survey that would estimate the average length of stay with a bound on the error of
estimation of 0.1 years, if the cost of interviewing one customer from the North is $30? Presampling
costs may be ignored. (you may take into account that the population is large, that is, you may ignore the
ﬁnite population corrections) 8/13 Q. 4) [14] From the list of 210 registered persons in Labor Force Population Survey (Question 2), a
systematic sample of size n = 21 is to be selected.
@a) (i) What is the precise name of this sample? Explain how you would select that sample. (ii) Looking
at the description of the population in Q. 2, can you consider this systematic sample just as another IQ]
simple random sample? Discuss.
(b) You want to improve your design by selecting ﬁve systematic samples of size 7. (i) How is this
sampling called? What is an advantage of selectin.veral systematic samples? (ii) Explain in detail
how would you selected this sample. Show ﬁrst two of your ﬁve systematic samples (by reporting selected persons).
(continued) lg 9/13 , the samle in (b), the following results were obtained: 2 3 4 5 total
total hours oer week in the samle 238 254 307 270 1309
total weekly wage in the samle 1693 1894 1636 2188 1775 9186 (i) Estimate the average hours worked per week per person, and place a bound on the error of estimation.
(ii)Are these estimators unbiased or biased? Explain. Q E] (c) After selectin (d) (i) Estimate the average wage per hour. What kind of estimator are you using? Is it unbiased or
biased? Explain. (ii) Compare the estimate from (i) with the result in Q.2 (f). Could you say these two
estimates are close, or not? Try to justify your conclusion. Q 10/13 Q. 5) [16] In the Labor Force Population Survey (Q2), as already mentioned, the city area is divided
into small clusters of four nearby households, and all persons in the labour force are registered, cluster
by cluster. The total number of clusters is 51. So, a cluster consists of all persons living in these four
households who are in the labour force (households without persons in the labour force are excluded).
An SRS of six clusters is selected, and all persons living in these clusters that are registered are included
in the sample. After collecting required information, the following results are obtained (age of sampled aersons is reorted): —__—_—— 2m} = SLZy,2 = 76667, Zmlyi = 2467, (a) What kind of sampling design is used h. Explain in some details how the sample was selected.
(b) Assume you don’t know the total numb . persons in the labour force (population size). Estimate
the average age of the persons ' * he labour force, and place a bound on the error of estimation. What
kind of estimator are you usin (continued) 11/13 (c) (i) Use your knowledge of the population size (N =!) and a different type of estimator than in (b)
to estimate the average age of the persons in the labour force, and to place a bound on the error of
estimation. What kind of estimator are you using? (ii) Which of these two estimators, from (i) here and
from (b) is more efﬁcient, after looking at the results? Could you expect this result without calculation?
Explain.
(d) You decided to use PPS sample of six clusters (repetitions are allowed). The following is a part of
the list of the clusters in the onoulation and their sizes (number of aersons in labour force):
——nn 9
3 6 6 11 210
(i) Explain in detail how you would select the sample. Apply the method you think is the simplest one.
(ii) Use the following portion of the table of random numbers
31624763841740353363441676448664758753667655431601126143307260332
92325194742363227889479140258437680208017215239339348060890325570
and the procedure described in (i) until you get selected two clusters from the twelve clusters actually
listed above (don’t forget, you still have 51 of them). 12/13 Q. 6) [10] Theoretical question:
In the ﬁrst part of Q. 5, an SRS of six clusters was selected form 51 clusters. If you use two digits from the table of random numbers, you could do it by assigning digits 01 t0 Cluster 1, 02 to Cluster 2, ..., and
digits 51 to Cluster 51. Then, you have to ignore two digit groups 52, 53, ..., 99, and 00 (thus, 49% of
the table). (a) Prove that the following improved procedure would also provide an SRS of n clusters (without
repetition), n < 51: Select one cluster at random from the list of 51 clusters, using any practical procedure, and remove it
from the list. Then select an SRS of n cluster from the remaining list of 50 clusters, by assigning two
groups of two digits to each remaining cluster, e.g., 01, O2 to Cluster 1, 03, O4 to Cluster 2, ..., 99, 00 to
Cluster 50. In that way we will ignore some digits only when we select the ﬁrst cluster for removal. (b) Generalize the above method by considering the case of selecting an SRS of size n from a population
of size N: Select ﬁrst k, k < N, elements at random (without repetition) out of N elements and remove
them from the list. Then select an SRS of size n out of remaining N — k, n < N — k, elements. Prove that
the selected sample is an SRS of size n from the population of size N. Warning: you have to use the deﬁnition of an SRS, not just to improvise. Total marks = 100 13/13 ...
View
Full Document
 Spring '13
 Dragon
 labour force

Click to edit the document details