sta304h-m14.pdf - Last name First name Student UNIVERSITY...

Info icon This preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
Image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
Image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
Image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
Image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Last name: First name: Student #: UNIVERSITY OF TORONTO Faculty of Arts and Science April 2014 EXAMINATIONS STA 304H1 S/1003H S Duration - 3 hours Examination Aids: Non—Programmable Calculator, aid sheet, eight one-sided pages, or four two- sided pages, with theoretical formulas and definitions only, as posted on Portal. Q. 1) [20] You are interested in the coffee habits of undergraduate U of T students and want to estimate the percentage of students who drink coffee regularly (and some other characteristics). You stay on College Street at 12 PM (noon) on a week day in the front of Second Cup and interview every 10th person passing you that might be a student in appearance (e. g., you don’t stop children and “older” persons). You ask them the following questions: Q1: Are you an undergraduate U of T student? If no, thanks. If yes, Q2: Do you drink coffee regularly? If no, thanks. If yes, Q3: How many cups of coffee do you drink per day, on average? Q4: What kind of coffee do you drink most often (black, with milk, latte, etc)? If the person you are interviewing is a U of T student (after Q1), you also record his/her gender (F, or M) (we assume you don’t need to ask a question about it). You stop interviews once you collect a desired number of responses from U of T students. (a) What is your (i) target .ulation, and (ii) sampling population? (b) (i) How would you descrfiour sampling design? (i) Can it be considered as an SRS? Discuss. (continued) Q 1/13 (c) What kind of sampling bias may you have in your design (nonrepresentativness due to your sampling population)? Explain in short, in terms of this study (don’t use some unrelated geneQ terminology). (d) What kind of nonresponse may you 'ct in your sampling and interviewing? Explain at least for two different cases. (e) Do you have any criticism of the questionnaire? Give your objections in short to at least one of the questions. (continued) 2/13 (f) (g) Assume all is nice with the sampling design. (i) What would be your sample size (intended number of interviewed U of T students), if you wanlg stimate the percentage of students that drink coffee regularly, with an error bound of 5%. am your experience, you have a reasonable guess that this percentage is at least 70%. (ii) How would you choose your sample size, if you have no idea what this percentage might be? Would it signific E: fect your sample size, compared with (i)? (iii) Do you have any comment on practicality (or i . cticality) of the obtained sample sizes (hopefully, you did it correcm Assume again all is nice wit - sampling design. You want to estimate the average number of cups of coffee a student drinks per day (for students th A * n’t drink coffee at all zero cups is counted). (i) W E ould you propose as the sample Sig you want to estimate this average with an error bound . . .2 cups? (to properly answer the question, you have to make some reasonable guess about the number of cups) (ii) How would this sample size affect your estimation intended in (f)? Would it be better, or worse? Explain. lg 3/13 Q. 2) [24] In a Labor Force Population Survey (U.S., 1976), a city area is divided into small clusters of four nearby households, and all persons in the labour force are registered, cluster by cluster. The total number of the registered persons is 210. An SRS of 20 persons is selected, and information on the variables x, y, z, and w, is obtained, as follows: —---uun-nn--- 15 -16 —m i-nlgln-nnn-nn-u-n-lsu —m---- WM Mi “II-ml— 227 500 HoursPW= hours working per week, WklyWG= weekly wage, Sex: 0 — male, 1 ~ female, inwi = 217919 222:“): = 249585 N O (a) Estimate the average age of persons in the labour force in the area, and pl a bound on the error of estimation. E, (b) Estimate the total number of females in the labout force, and place a bound on the error estimation. Do you think the total number of males in the labor force is significantly larghan of females, using the sample results? (you don’t need to use a formal statistical test, but to reasonably justify your conclusion) (0) If you want to conduct another survey to estimate only the average hours working per week per person with the error bound of 6 hours, what would be your sample size? Use the information provided by this sample. (continued) Q 4/13 (d) Estimate the average weekly wage per female worker and place a bound on the error of estimation. What kind of the estimator are you using? Is it unbia.7 Explain. (e) Estimate the total amount of the weekly wages for all female in the labour fochzd place a bound mflwamnfiflmmmmtmmmwbmRummmme%fimwhmh¢m%wwwymummonfifi you don’t see the solution Clearly) (continued) 5/13 (t) Estimate the average wage per hour earned by the labour force in the city area and place a bound on the error of estimation. (g) You want to estimate the average weekly wage per person in labour force and you can use information given in the sample, which is either age, or hours working per week (you will use one of them). Which of these two you think is more useful? Justify and use it to estimate the average weekly wage per person. It is known that the average age in the labour force is 32 years, and the average hours working per week is 36 hours. Use an estimator you think will give you the best estimate. 6/13 Q. 3) [16] A certain city is divided into three service areas, the North, Southeast and Southwest, with 125,000, 75,000 and 50,000 households respectively. In the most recent survey, an SRS from each area was selected, and each selected household was interviewed. Among other variables, the number of years living in the household (length of stay, y), and whether household uses a garage, were recorded. The results are summarized in the followin table: Area Number of Sample Average Length Standard Using a Households Size of stay, ?, Deviation, S Garage, —p,. 1—25 000 _81 =4% —300 10. 24 2.03 _2% (a) (i) Estimate the average length of stay for the city, and (ii) the standard deviation of the estimator. (b) (i) Estimate the total number of households who use garages in the city, and (ii) place a bound on the error of estimation. (continued) 7/13 (c) If the costs of sampling from each stratum were equal would the optimal allocation produce significantly better results than (i) an SRS, (ii) proportional allocation, for the same sample size? Explain, without using calculation. (d) A new survey is planned. The costs of interviewing customers from the North, Southeast and Southwest are in proportions 2 : 3 : 4 respectively (differences mostly due to traveling). What is the minimal cost of a survey that would estimate the average length of stay with a bound on the error of estimation of 0.1 years, if the cost of interviewing one customer from the North is $30? Presampling costs may be ignored. (you may take into account that the population is large, that is, you may ignore the finite population corrections) 8/13 Q. 4) [14] From the list of 210 registered persons in Labor Force Population Survey (Question 2), a systematic sample of size n = 21 is to be selected. @a) (i) What is the precise name of this sample? Explain how you would select that sample. (ii) Looking at the description of the population in Q. 2, can you consider this systematic sample just as another IQ] simple random sample? Discuss. (b) You want to improve your design by selecting five systematic samples of size 7. (i) How is this sampling called? What is an advantage of selectin.veral systematic samples? (ii) Explain in detail how would you selected this sample. Show first two of your five systematic samples (by reporting selected persons). (continued) lg 9/13 , the sam-le in (b), the following results were obtained: 2 3 4 5 total total hours oer week in the sam-le 238 254 307 270 1309 total weekly wage in the samle 1693 1894 1636 2188 1775 9186 (i) Estimate the average hours worked per week per person, and place a bound on the error of estimation. (ii)Are these estimators unbiased or biased? Explain. Q E] (c) After selectin- (d) (i) Estimate the average wage per hour. What kind of estimator are you using? Is it unbiased or biased? Explain. (ii) Compare the estimate from (i) with the result in Q.2 (f). Could you say these two estimates are close, or not? Try to justify your conclusion. Q 10/13 Q. 5) [16] In the Labor Force Population Survey (Q2), as already mentioned, the city area is divided into small clusters of four nearby households, and all persons in the labour force are registered, cluster by cluster. The total number of clusters is 51. So, a cluster consists of all persons living in these four households who are in the labour force (households without persons in the labour force are excluded). An SRS of six clusters is selected, and all persons living in these clusters that are registered are included in the sample. After collecting required information, the following results are obtained (age of sampled aersons is reorted): —__—_——- 2m} = SLZy,2 = 76667, Zmlyi = 2467, (a) What kind of sampling design is used h. Explain in some details how the sample was selected. (b) Assume you don’t know the total numb . persons in the labour force (population size). Estimate the average age of the persons ' * he labour force, and place a bound on the error of estimation. What kind of estimator are you usin- (continued) 11/13 (c) (i) Use your knowledge of the population size (N =!) and a different type of estimator than in (b) to estimate the average age of the persons in the labour force, and to place a bound on the error of estimation. What kind of estimator are you using? (ii) Which of these two estimators, from (i) here and from (b) is more efficient, after looking at the results? Could you expect this result without calculation? Explain. (d) You decided to use PPS sample of six clusters (repetitions are allowed). The following is a part of the list of the clusters in the onoulation and their sizes (number of aersons in labour force): -—-—--nn 9 3 6 6 11 210 (i) Explain in detail how you would select the sample. Apply the method you think is the simplest one. (ii) Use the following portion of the table of random numbers 31624763841740353363441676448664758753667655431601126143307260332 92325194742363227889479140258437680208017215239339348060890325570 and the procedure described in (i) until you get selected two clusters from the twelve clusters actually listed above (don’t forget, you still have 51 of them). 12/13 Q. 6) [10] Theoretical question: In the first part of Q. 5, an SRS of six clusters was selected form 51 clusters. If you use two digits from the table of random numbers, you could do it by assigning digits 01 t0 Cluster 1, 02 to Cluster 2, ..., and digits 51 to Cluster 51. Then, you have to ignore two digit groups 52, 53, ..., 99, and 00 (thus, 49% of the table). (a) Prove that the following improved procedure would also provide an SRS of n clusters (without repetition), n < 51: Select one cluster at random from the list of 51 clusters, using any practical procedure, and remove it from the list. Then select an SRS of n cluster from the remaining list of 50 clusters, by assigning two groups of two digits to each remaining cluster, e.g., 01, O2 to Cluster 1, 03, O4 to Cluster 2, ..., 99, 00 to Cluster 50. In that way we will ignore some digits only when we select the first cluster for removal. (b) Generalize the above method by considering the case of selecting an SRS of size n from a population of size N: Select first k, k < N, elements at random (without repetition) out of N elements and remove them from the list. Then select an SRS of size n out of remaining N — k, n < N — k, elements. Prove that the selected sample is an SRS of size n from the population of size N. Warning: you have to use the definition of an SRS, not just to improvise. Total marks = 100 13/13 ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern