STAT 231 FALL 2007 A1-sol

STAT 231 FALL 2007 A1-sol - Stat23 1 -Assignment 1 Family...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Stat23 1 -Assignment 1 Family (last) Name: Section #: Given (first) Name: Grade = ID: Marks Available: 48 Due Date: Thursday Sept. 27th 2007, 10am in the Tutorial Centre MC6094/MC6095. 999'? 8. General Rules: . Although it is understood that some measure of collaboration will occur in the assignments, the work you hand in must be your own and differ from those of your compatriots. All help should be cited. Excessive collaboration (ie. copying) is not acceptable. Please read the university policies on what other things constitute cheating. Late assignments will not be accepted. The tutorial center will be manned from 8am until 10am on the date the assignment is handed.in. You must print this assignment and put your solutions ON the assignment pages. The only exception(s) to this policy will be the course note homework. Course note homework can be attached to the END of the assignment. Please use the back if you need more space and CLEARLY indicate where the solution has gone. All graphical displays should be properly labelled and titled. All R graphics output (ie. plots) should be cut and paste into your assignment. Reading the data table into R (Question 2) . Download the data file into an appropriate folder. Open R. In R, select in order the menu items FILE, CHANGE DIR... In the corresponding pop-up box, browse and select the folder in 1. Type the bolded code: data=read.table(”fev.dat”,header=T) Columns in the dataset may be accessed using: data$variable_name If you need to you can subset the data into a smokers dataset and a nonsmokers dataset. Do this using the commands: (a) smoke = data[data$smoke=l,] (b) If you need to subset the dataset by column, the code would be: newdata=data[,c(column 1, column 2, ...)] nosmoke = data[data$smoke=0,] The Story The effects of smoking are usually an interesting problem to investigate statistically. One such study was performed in the mid to late 1970s. At the time a sample of 654 youths (aged 3 to 19) from Boston was taken. The objective of the study was to measnre the effects of smoking on FEV (Forced Expiratory Volume). FEV is the amount of air you can exhale in one second of a forceful breath. Source: Forced Expiratory Volume, Ioumal of Statistics Education, Volume 13, Number 2 (July 2005).. Course Note Questions [NOT MARKED] Please hand in questions 2 and 4 from section 1.6.1. Attach your answers to the end of the assignment. l. [6 Marks] Fill in the blanks. Using terminology from the course, and information from the description above, fill in the blanks in the following statements [more than one word may be necessary]: (a) For ethical reasons the study has a(n) observational plan. This means that the scientist will not change/manipulate any explanatory variates in the study. (b) Variates include sex, height (in inches), age (in years), FEV (in litres) and smoke. Smoke indicates whether they smoke or not. The response variate is F EV Sex is a(n) categorical variable and has a(n) binary data type. Since sex was initially coded as ”male” or ”female” and the statistician needed to use numbered flags for this variable she coded sex into female (0) and male (1). The statistician similarly assigned the smoker a value of l, and a non-smoker a value of 0. The height (in inches), has a(n) continuous and ordinal data type. A(n) affine/monotone transformation was used to convert the data from centimeters to inches. Applying the results of this study to other people in the frame might be subject to sample error. (c) All youths aged 3 to 19 could be considered to be our Target population. The unit is therefore a youth aged 3 to 19 in the 19705. 2. [16 Marks] Read the data into R, as described above. (a) Copy and paste two box plots, for the FEV of smokers and one for the FEV of nonsmokers, on the same axis. Example Code: boxplot(variable~grouping_variable,main=”Title”,xlab=”x label descrip- tion”,ylab=”y label description”) Solutions boxplot(data$fev~data$smoke,main=”Smokers Vs. Nonsmokers FEV”,xlab=”Smokers:1, Nonsmokers=0",ylab=”FEV”) Smokers Vs. Nonsmokers FEV W! l —I— O 1 Smoker521. Nonsmokers=0 (b) [NOT MARKED] Compare and contrast the two box plots, briefly [one or two sentences], with respect to: * Shape The smokers data appears skewed right. The non—smokers c data appears non skewed, but has outliers. * Location The center of the smokers data is roughly around 25 L while that of the non-smokers is a little over 3 L. * Variability 'lhe spread of the smokers data is tighter than that of the non-smokers. There are more outliers in the non-smoker data. (c) (d) (e) Numerically confirm your graphical analysis by finding, using the commands st and mean(), the mean and variability of the smoker and non smoker FEV. Use the table below. FEV | Smoker | NonSmoker Variance 0.56 0.72 I I Mean | 3.28 | 2.57 A secondary objective of the study was to investigate the relationships between age, FEV and smoking. A set of scatterplots may be obtained by subsetting the data set and ploting the subset: Example Code: plot(newdata) note: A title is unnecessary in this case. Copy and paste the plot below. Solutions: The drawing on the left was intended, but due to a typo, the one on the right should have been obtained. 123‘s" age m - ' .‘ ’ 'tx ::-. l - - . -,~ I fev ‘. .: i I n — n : I h h . ~ - . .- : 1' 4 . . . 4 . . . . . . . . —.———_— f‘ 2 3 hi 8 smoke 3 a W , '9 “T.-- 1 y Y '7 “WT. As so 55 so 65 1o 15 s m 15 an M n- as as 10 [NOT MARKED] Describe the relationship(s) that you plotted in (6). One on the left: As you get older or taller, your FEV increases. One on the right: Hard to interpret.the smoking vs. other variable. Otherwise see the interp. above. in on :32 n: as 09 ‘fl (f) (g) (h) (i) 0) Determine the correlation between FEV and Age, Age and Height, and finally FEV and Height. Use the same dataset you created in (e) and the command corO. Correlation I Age I FEV I Height Age 1 0.76 0.79 I l | FEV 0.76 1 0.87 | l | Height 0.79 0.87 1 Why is the correlation between Age and Age equal to 1? Show, mathematically, that 1‘ will always be 1 in this case. SXX VSXXSXX True or False. Please circle the correct response. Any unclear circle will be assigned a zero grade. Mathematically r = = 1. * The relationship between FEV and Age is strong and negative. TRUE [FEE * The relationship between Age and Height is weak and positive. TRUE m * The relationship between FEV and Height is strong and positive. m FALSE >r The relationship betvOeen Height and is strong and positive. ME FALSE Yet one more goal of this study was to look at the relative risk of smoking by gender. Subset the data as you have done before, selecting only the smoking and sex columns. Use the table() command to provide the necessary counts. Fill in the table below. Counts I Smoke I Sex I 0 | 1 0 | 279 | 39 l I 310 I 26 Use the table in (j) to calculate the relative risk. 39 relative risk = 27922339 310+26 : § 53 (k) The risk of a female smoking is (fill in the blank) greater than that of a male. 3. [12 marks] Non-Stop Flights of Northem Hawk Owls are historically determined by H N G (25, 4), in miles, while the flights of Barn Owls are given by B ~ G(20, 7). Flights of both types of Owls are independent. I’m curious as to whether or not the the Northern Hawk Owls fly further during non-stop flights than their Barn Owl cousins. In all cases write a brief conclusion in layman’s terms. Find... (a) The probability that the Northern Hawk Owls travel less than 20 miles. Pr(H<20) = Pr(Z<20;2O) = qu<—@ m 10.6% .'.the N. Hawk owls are travel 20 miles or less, 10.6% of the time. (b) The probability that the Barn Owls travel between 21 and 29 miles non—stop. Pr(21<B<29) = Pr(21_20<Z< 29—20 7 7 ) Pr(%<Z<%) = Pr(Z < 2) — Pr(Z < %) 90.0% — 55.7% 2 34.3% II .',the B. Owls travel between 21 and 29 miles non-stop 34.3% of the time. (c) The probability that the Barn Owls travel 3 miles or more, less than the N. Hawk Owls. LetA =.H—B E(A) = 25—20=5 Var (A) = Var(H) + Var(B) =16 + 49 : 65 Pr(AZ3) = 1—Pr(A<3) 3—5 = l—P Z<— z 1— Fr (Z < —0.248) 2 59.7% .‘,the B, Owls are worse by at least 3 miles, 59.7% of the time. 4. [8 Marks] Determine the value(s) of a and b necessary to ensure that the function f is a pdf with expected 2 < < value 1. where we) is defined by: me) = {if + bx f“ (:51; Note, for an I, we) 2 o oo 2 oo 2 [mflafldx = /0ax+bar2d:c E(X)=/ xf(x)da: = /Oax2+bx3dx a b 2 a b 2 1 = _2 _3 = _3 _4 2x +3x 0 1 3x 4x 0 _ E 2 E 3 = E 3 E 4 1 2(2)1332) 1 3(2)+4<2) 8b 8a 1 ‘- 20. E 1 ~' 3 b 1 : 2a+%— => 3 = 6a+8b 8a 16a 1 = _ = _ 3+41) => 2 3 8b 3 3 a - E :> b — -Z .3 32 Therefore the answer IS —:12 - —x 2 4 5. [6 marks] Using only the tools of Stat 230, Show that the sample variance, 32, is an unbiased estimator of 02 (ie. E (S2) = 02). [Given: 52:71:] Elm—E231: [ix-2'51] 37:7 ELillfiXi—M-(Y—MF]=nil'E[:(Xr#)2—n(f—#)2] =7}1:,El<Xt—u>2lflaky—M] :fi.ii:zla2—ni%2]:nil-[n02*02]:n:1'a2(n_1)=”2 As the expected value of the sampling distribution for sample variance equals the population variance [i.e., E [52] = a2 ], the sample variance is an unbiased estimator of the population variance. OR . [Egg—732D n:1[1§:E(Xf)—7LE(72)] i=1 Var(X) = E(X2) — E(X)2 0.2 = _ M2 E(X2) = 0.2 +'u2 Var(—) = E(Y2) — 19(7)? ~a2 7 —2 2 g - E(X )-# 15(72) = "—2 + 2 n P“ Therefore E 1 i X2 — 7172 = 1 n (g2 + #2) _ n 1% + #2 77. - 1 i=1 1 71 — 1 i=1 TL = 1 i<02+u2) n 12-+M2 n “ 1 1:1 TL 1 Course Note Questions [NOT MARKED]: Sketch Solutions 221] E(aX + b) = f(az' + b) f(:1:)dx = famf(x) + bf(:c)dx = afccf(x)d:c +bff(x)dz = aE(X) + b(1) 2b] 3 : 2v: rim“; 2 21:] ar;+§:;'=l b : «223:; zi+nb : + b : (15+ b 20] mm = f (z — m2 mom Va'r(aX + b) = f (ax + b — up — b)2 f(x)dm (see 221) 2 2 [(am — amg f(x)dx = faz (z —— ,u) f(a:)da: : a2 f (x — m2 f(a:)da: = anar(X) Therefore sd(aX + b) : |a| sd(X) 2d] (2 2) 2(3- —53)2 : (2x?) + (252) — 2(2221-5) : (2x?) +n§2 — 252m : (2x?) +nE2 — 25% : (where n = 10). 4a] 1-pn01m(6.5,11.74,3.5"2)=66.5% 4b] 1—pn0rm(6.5,5.31,0.58A2)=0.02020012% 4c] ’ 1-pnorm(6,11.74,3.5"2)= 68% l-pnorm(6,5.31,0.58"2) = 2% 4d] 1 — Pr(Y < C) = 0.98 Pr(Y < C) : 0.02 Pr(Z < C—ggafl) = 0.02 Based on R or a normal table: 0—11.74 _ T _ —2.05 C : 11.74 — 2.05(3.5) : 4.565 ...
View Full Document

Page1 / 9

STAT 231 FALL 2007 A1-sol - Stat23 1 -Assignment 1 Family...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online