This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stat23 1 Assignment 1 Family (last) Name: Section #: Given (ﬁrst) Name: Grade = ID: Marks Available: 48
Due Date: Thursday Sept. 27th 2007, 10am in the Tutorial Centre MC6094/MC6095. 999'? 8. General Rules: . Although it is understood that some measure of collaboration will occur in the assignments, the work you hand in must be your own and differ from those of your compatriots. All help should be cited. Excessive collaboration
(ie. copying) is not acceptable. Please read the university policies on what other things constitute cheating. Late assignments will not be accepted. The tutorial center will be manned from 8am until 10am on the date the
assignment is handed.in. You must print this assignment and put your solutions ON the assignment pages. The only exception(s) to this
policy will be the course note homework. Course note homework can be attached to the END of the assignment.
Please use the back if you need more space and CLEARLY indicate where the solution has gone. All graphical displays should be properly labelled and titled. All R graphics output (ie. plots) should be cut and paste into your assignment. Reading the data table into R (Question 2) . Download the data ﬁle into an appropriate folder. Open R.
In R, select in order the menu items FILE, CHANGE DIR... In the corresponding popup box, browse and select the folder in 1.
Type the bolded code: data=read.table(”fev.dat”,header=T)
Columns in the dataset may be accessed using: data$variable_name If you need to you can subset the data into a smokers dataset and a nonsmokers dataset. Do this using the
commands:
(a) smoke = data[data$smoke=l,] (b) If you need to subset the dataset by column, the code would be: newdata=data[,c(column 1, column 2, ...)] nosmoke = data[data$smoke=0,] The Story The effects of smoking are usually an interesting problem to investigate statistically. One such study was performed in the mid to late 1970s. At the time a sample of 654 youths (aged 3 to 19) from Boston was taken. The
objective of the study was to measnre the effects of smoking on FEV (Forced Expiratory Volume). FEV is the amount
of air you can exhale in one second of a forceful breath. Source: Forced Expiratory Volume, Ioumal of Statistics
Education, Volume 13, Number 2 (July 2005).. Course Note Questions [NOT MARKED] Please hand in questions 2 and 4 from section 1.6.1. Attach your answers to the end of the assignment. l. [6 Marks] Fill in the blanks. Using terminology from the course, and information from the description above,
ﬁll in the blanks in the following statements [more than one word may be necessary]: (a) For ethical reasons the study has a(n) observational plan. This means that the scientist will not change/manipulate any explanatory
variates in the study. (b) Variates include sex, height (in inches), age (in years), FEV (in litres) and smoke. Smoke indicates
whether they smoke or not. The response variate is F EV Sex is a(n) categorical variable and has a(n) binary
data type. Since sex was initially coded as ”male” or ”female” and the statistician needed to use numbered ﬂags
for this variable she coded sex into female (0) and male (1). The statistician similarly assigned the smoker a value of l, and a nonsmoker a value of 0. The height (in inches), has a(n) continuous and ordinal
data type.
A(n) afﬁne/monotone transformation was used to convert the data from centimeters to inches. Applying the results of this study to other people in the frame might be subject to
sample error. (c) All youths aged 3 to 19 could be considered to be our Target population. The unit is therefore a youth aged 3 to 19 in the 19705. 2. [16 Marks] Read the data into R, as described above.
(a) Copy and paste two box plots, for the FEV of smokers and one for the FEV of nonsmokers, on the same
axis.
Example Code: boxplot(variable~grouping_variable,main=”Title”,xlab=”x label descrip
tion”,ylab=”y label description”) Solutions
boxplot(data$fev~data$smoke,main=”Smokers Vs. Nonsmokers FEV”,xlab=”Smokers:1,
Nonsmokers=0",ylab=”FEV”) Smokers Vs. Nonsmokers FEV W! l
—I— O 1 Smoker521. Nonsmokers=0 (b) [NOT MARKED] Compare and contrast the two box plots, brieﬂy [one or two sentences], with respect to:
* Shape
The smokers data appears skewed right. The non—smokers c data appears non skewed, but has outliers. * Location
The center of the smokers data is roughly around 25 L while that of the nonsmokers is a little over 3 L. * Variability
'lhe spread of the smokers data is tighter than that of the nonsmokers. There are more outliers in the nonsmoker data. (c) (d) (e) Numerically conﬁrm your graphical analysis by ﬁnding, using the commands st and mean(), the mean
and variability of the smoker and non smoker FEV. Use the table below. FEV  Smoker  NonSmoker
Variance 0.56 0.72
I I
Mean  3.28  2.57 A secondary objective of the study was to investigate the relationships between age, FEV and smoking. A
set of scatterplots may be obtained by subsetting the data set and ploting the subset:
Example Code: plot(newdata) note: A title is unnecessary in this case. Copy and paste the plot below. Solutions:
The drawing on the left was intended, but due to a typo, the one on the right should have been obtained. 123‘s" age m  ' .‘ ’
'tx ::. l
  . ,~ I
fev ‘. .: i I
n — n :
I h h .
~  . . :
1' 4 . . . 4 . . . . . . . . —.———_— f‘
2
3
hi 8 smoke
3
a
W , '9 “T. 1 y Y '7 “WT.
As so 55 so 65 1o 15 s m 15 an M n as as 10 [NOT MARKED] Describe the relationship(s) that you plotted in (6).
One on the left: As you get older or taller, your FEV increases. One on the right: Hard to interpret.the smoking vs. other variable. Otherwise see the interp. above. in on :32 n: as 09 ‘ﬂ (f) (g) (h) (i) 0) Determine the correlation between FEV and Age, Age and Height, and ﬁnally FEV and Height. Use the
same dataset you created in (e) and the command corO. Correlation I Age I FEV I Height
Age 1 0.76 0.79
I l 
FEV 0.76 1 0.87
 l 
Height 0.79 0.87 1 Why is the correlation between Age and Age equal to 1? Show, mathematically, that 1‘ will always be 1 in
this case. SXX VSXXSXX True or False. Please circle the correct response. Any unclear circle will be assigned a zero grade. Mathematically r = = 1. * The relationship between FEV and Age is strong and negative. TRUE [FEE * The relationship between Age and Height is weak and positive. TRUE m
* The relationship between FEV and Height is strong and positive. m FALSE
>r The relationship betvOeen Height and is strong and positive. ME FALSE Yet one more goal of this study was to look at the relative risk of smoking by gender. Subset the data as
you have done before, selecting only the smoking and sex columns. Use the table() command to provide the necessary counts. Fill in the table below.
Counts I Smoke I Sex I 0  1
0  279  39
l I 310 I 26
Use the table in (j) to calculate the relative risk.
39
relative risk = 27922339
310+26
: §
53 (k) The risk of a female smoking is (ﬁll in the blank) greater than that of a male. 3. [12 marks] NonStop Flights of Northem Hawk Owls are historically determined by H N G (25, 4), in miles,
while the ﬂights of Barn Owls are given by B ~ G(20, 7). Flights of both types of Owls are independent. I’m
curious as to whether or not the the Northern Hawk Owls ﬂy further during nonstop ﬂights than their Barn Owl
cousins. In all cases write a brief conclusion in layman’s terms. Find... (a) The probability that the Northern Hawk Owls travel less than 20 miles. Pr(H<20) = Pr(Z<20;2O)
= qu<—@
m 10.6% .'.the N. Hawk owls are travel 20 miles or less, 10.6% of the time. (b) The probability that the Barn Owls travel between 21 and 29 miles non—stop. Pr(21<B<29) = Pr(21_20<Z< 29—20 7 7 )
Pr(%<Z<%) = Pr(Z < 2) — Pr(Z < %)
90.0% — 55.7% 2 34.3% II .',the B. Owls travel between 21 and 29 miles nonstop 34.3% of the time. (c) The probability that the Barn Owls travel 3 miles or more, less than the N. Hawk Owls. LetA =.H—B
E(A) = 25—20=5
Var (A) = Var(H) + Var(B) =16 + 49 : 65
Pr(AZ3) = 1—Pr(A<3)
3—5
= l—P Z<— z 1— Fr (Z < —0.248)
2 59.7% .‘,the B, Owls are worse by at least 3 miles, 59.7% of the time. 4. [8 Marks] Determine the value(s) of a and b necessary to ensure that the function f is a pdf with expected 2 < <
value 1. where we) is deﬁned by: me) = {if + bx f“ (:51; Note, for an I, we) 2 o oo 2 oo 2
[mﬂaﬂdx = /0ax+bar2d:c E(X)=/ xf(x)da: = /Oax2+bx3dx a b 2 a b 2
1 = _2 _3 = _3 _4
2x +3x 0 1 3x 4x 0
_ E 2 E 3 = E 3 E 4
1 2(2)1332) 1 3(2)+4<2)
8b 8a
1 ‘ 20. E 1 ~' 3 b
1 : 2a+%— => 3 = 6a+8b
8a 16a
1 = _ = _
3+41) => 2 3 8b
3 3
a  E :> b — Z
.3 32
Therefore the answer IS —:12  —x 2 4 5. [6 marks] Using only the tools of Stat 230, Show that the sample variance, 32, is an unbiased estimator of 02
(ie. E (S2) = 02). [Given: 52:71:] Elm—E231: [ix2'51]
37:7 ELillﬁXi—M(Y—MF]=nil'E[:(Xr#)2—n(f—#)2] =7}1:,El<Xt—u>2lﬂaky—M]
:ﬁ.ii:zla2—ni%2]:nil[n02*02]:n:1'a2(n_1)=”2 As the expected value of the sampling distribution for sample variance equals the population variance
[i.e., E [52] = a2 ], the sample variance is an unbiased estimator of the population variance. OR . [Egg—732D n:1[1§:E(Xf)—7LE(72)] i=1 Var(X) = E(X2) — E(X)2
0.2 = _ M2
E(X2) = 0.2 +'u2
Var(—) = E(Y2) — 19(7)?
~a2 7 —2 2
g  E(X )#
15(72) = "—2 + 2
n P“
Therefore E 1 i X2 — 7172 = 1 n (g2 + #2) _ n 1% + #2
77.  1 i=1 1 71 — 1 i=1 TL
= 1 i<02+u2) n 12+M2
n “ 1 1:1 TL
1 Course Note Questions [NOT MARKED]: Sketch Solutions 221] E(aX + b) = f(az' + b) f(:1:)dx = famf(x) + bf(:c)dx = afccf(x)d:c +bff(x)dz = aE(X) + b(1)
2b] 3 : 2v: rim“; 2 21:] ar;+§:;'=l b : «223:; zi+nb : + b : (15+ b 20] mm = f (z — m2 mom
Va'r(aX + b) = f (ax + b — up — b)2 f(x)dm (see 221)
2 2 [(am — amg f(x)dx = faz (z —— ,u) f(a:)da: : a2 f (x — m2 f(a:)da: = anar(X)
Therefore sd(aX + b) : a sd(X) 2d] (2 2) 2(3 —53)2 : (2x?) + (252) — 2(22215) : (2x?) +n§2 — 252m : (2x?) +nE2 — 25% : (where n = 10). 4a] 1pn01m(6.5,11.74,3.5"2)=66.5%
4b] 1—pn0rm(6.5,5.31,0.58A2)=0.02020012%
4c] ’ 1pnorm(6,11.74,3.5"2)= 68%
lpnorm(6,5.31,0.58"2) = 2% 4d] 1 — Pr(Y < C) = 0.98
Pr(Y < C) : 0.02 Pr(Z < C—ggaﬂ) = 0.02
Based on R or a normal table: 0—11.74 _
T _ —2.05 C : 11.74 — 2.05(3.5) : 4.565 ...
View
Full
Document
 Winter '08
 CANTREMEMBER

Click to edit the document details