This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Stat 231 Assignment 4 Solutions
1. Methylcyclopentadienyl Manganese Tricarbonyl (MMT) has been added to unleaded
gasoline in Canada to help reduce engine knock. The automobile manufacturers
claim that the addition of MMT to gasoline increases the amount of carbon monoxide
(CO) emitted, measured in gm/km. Consider the target population of all cars in
Canada at the current time.
a) Using the language of the course, what do we mean by the statement “Adding MMT
to gasoline increases the amount of CO emitted.”
We mean that adding MMT to gasoline causes an increase in mean CO emission
level. That is, if we were to calculate the mean CO emission level of all cars in
Canada and, while holding all other explanatory variables fixed, increase the
concentration of MMT in the gasoline in all cars, we would see an increase in mean
CO emission level. Since all other explanatory variates remained unchanged, we
could say that the change in mean CO was caused by the change in MMT.
In an investigation to examine this issue, 100 cars were haphazardly selected at Drive
Clean centers throughout Ontario over the course of one summer. For each selected
car, the required emissions test determined the amount of CO emitted. The
researchers determined the concentration of MMT (mg/l) in the car’s fuel tank and
also recorded the distance the car had driven to the nearest thousand kilometers, as
measured by the odometer. You can find the data with the R code
The variate names are:
MMT: concentration of MMT in mg/l
CO: CO emissions (gm/km)
distance: thousands of kilometers driven
b) Is this Plan observational or experimental? Explain.
This is an observational Plan, since the investigators did not have control over or
manipulate the value of the focal variate (MMT) for each car in the sample.
c) Construct a scatterplot showing the relationship between CO levels and MMT
concentrations. Is the graph consistent with the statement that adding MMT to the
gasoline increases the amount of CO emitted? Yes, the graph is consistent with this statement. There appears to be a positive
(linear) relationship between CO emission and MMT - CO emission increases as
MMT increases. d) Fit the model Y j = α + β ( x j − x ) + R j , j = 1,...,100, to the data and add the fitted line
to the scatterplot. Estimate the rate of increase in average emitted CO for an increase
in MMT concentration of 1 mg/l. ˆ
From the fitted model, α = 0.77230 , β = 0.08892.
Recall that in a linear model, β represents the average (mean) change in the
response variate for a change of one unit in the explanatory variate.
From the fitted model, the estimate of the increase in the average CO emission for an
increase in MMT concentration of 1 mg/l is β = 0.08892 gm/km.
e) Plot the estimated residuals rj versus fitted values µ j – see assignment 3 for R code. Is
there evidence the model is inadequate? Yes, there is evidence that the model is inadequate, as the standard deviation of the
estimated residuals increases as µ j (estimated mean CO) increases. According to the
model, the standard deviation of the residuals (σ) should be constant over all units.
(Note that this inadequacy can also be seen in the scatterplot of CO vs MMT in d),
where the estimated residual rj is the distance of the jth observation from the fitted
line and µ j is the associated point on the line.) f) What does it mean if MMT concentration and distance are confounded in their effect
on CO emission levels?
It means that the effects of MMT concentration and distance on mean CO emission
level cannot be separated. This suggests that distance (confounding variate) does not
remain fixed as MMT (focal variate) changes, and is related to both MMT and CO
(response variate). In this study, it would seem likely that the older the car (larger
distance value) the more prone to engine knock (larger MMT), and the poorer it
would run (larger CO emission value). In this situation, it is not possible to estimate
the effect that each variate (MMT, distance) has on CO emission levels. We say then,
that the effects are confounded.
g) Construct a plot to see if there is confounding in this instance. Since distance changes (increases) as MMT changes (increases), we cannot
determine whether the change in CO is caused by the change in MMT or by the
change in distance (or by the change in some other unknown explanatory variate).
There does indeed appear to be confounding between MMT and distance.
The following 3-dimensional scatterplot of the relationship between the three variates
(MMT, distance, CO) suggested in f) is a nice illustration of confounding between
MMT and distance in their effects on CO. This plot was produced in R with the
‘scatterplot3d’ command after loading the scatterplot3d package. h) What do you now conclude about the statement “Adding MMT to gasoline increases
the amount of CO emitted.”?
This statement is misleading, as it implies a causitive relationship between MMT and
CO which we cannot substantiate, due to confounding between MMT and distance as
revealed in the previous plots. Alternative statements such as
“High levels of MMT in gasoline linked to increase in CO emission levels”, or
“Cars with high levels of MMT in gasoline have higher CO emission levels”
accurately describe the perceived relationship between MMT and CO but do not
incorrectly imply causation.
2. In a second investigation, the automobile manufacturers provided two cars from each
of 20 different models. One car in each pair was driven using gasoline with 16 g/l of
MMT. The second car in each pair used the same fuel but without MMT. The cars
were driven together under extreme conditions on a test track for 160,000 km and
then the CO emissions were measured.
a) Is this an experimental or observational Plan? Explain.
This is an experimental Plan, since the investigators manipulated the value of the
focal variate ( presence/absence of MMT) on each unit.
b) This Plan uses blocking. Explain how the blocking was done and its advantages.
A block consists of a pair of cars of the same model driven at the same time under the
same conditions, in which MMT was added to the gasoline in one car of each pair
and not the other. Blocking this way helps ensure that most potential confounding
explanatory variates (e.g., car model, road/environmental conditions, etc.) remain
fixed for cars within each block, so that by changing only the focal variate (MMT),
differences in CO levels within each block can be considered to be caused by the
change in MMT. We say that blocking has ‘controlled for’ these explanatory
c) This Plan uses replication. Explain how the replication was done and its advantages.
The Data stage was repeated (replicated) over 20 blocks (pairs of cars of the same
model). The larger the number of replicates (blocks, in this case) in the sample, the
more representative is the sample of the study population, thereby decreasing sample
error. In this study, the investigators feel that results from 20 different car models can
be inferred to all car models in the population. In terms of the statistical model, the
larger the number of replicates, the better our estimate of the standard deviation of
the process, σ, resulting in more precise estimates of the model parameters.
d) How would you assign the fuel types to the cars within each pair? Why?
The fuel types (With/Without MMT) should be assigned randomly to the cars within
each pair. Randomization of values of the focal variate within each block helps to
ensure that the values of any potential confounding variates not controlled for by
blocking (e.g., variates associated with the car drivers) are balanced out between
cars within each block. Randomization is another method we have to control for
confounding. Consider the model for the CO levels Yij = µ i + β j + Rij , Rij ~ G(0, σ ), independent ,
where i = 1,2 indexes the fuel type and j = 1,...20 indexes the pairs.
e) Express the Problem in terms of model parameters.
The Problem addresses the difference in mean CO emission levels between cars with
MMT added to the gasoline and cars without MMT added. In terms of the model, this
difference can be expressed as µ1 − µ 2 .
f) Show that we can examine this difference with the derived model
Yj = Y1 j − Y2 j , j = 1,...20 . What is the advantage of working with this model?
Y j = Y1 j − Y2 j
= ( µ1 + β j + R1 j ) − ( µ 2 + β j + R2 j )
= ( µ1 − µ 2 ) + (β j − β j ) + ( R1 j − R2 j )
= ( µ1 − µ 2 ) + ( R1 j − R2 j )
yielding a model of the form
Y j = µ + R j j = 1,2,...,20
where µ is the mean difference in CO emission level between cars within each block
and R j is the residual of the difference in CO for the jth pair.
The advantage of working with this model is that we do not have to account for the
block effects, β j . Since our random variable Y represents the difference in CO
emission level between cars of the same model (block), any changes in CO emission
levels due to the different car models does not affect our model.
The data are available using the R code:
The variates are:
pair: an index for the model
MMT: the CO level for the MMT fuel car
clear: the CO level for the clear fuel car
cylinders: the number of cylinders in the engine for the model
g) Prepare a plot to examine if there is a difference in CO emissions for the two fuels. Is
there evidence that this difference depends on the number of cylinders in the engine? From the boxplot, it appears that there is a difference in CO emission levels for the
two fuel types. On average, MMT-added cars appear to have approximately a 0.05
gm/km higher CO emission level than clear cars.
A scatterplot of the difference in CO emission levels vs number of cylinders suggests
that 8-cylinder cars have a larger difference in CO emission level between the two
fuel types than 4- or 6-cylinder cars. h) After the conclusions of the investigation were published, one critic claimed that a
limitation was possible confounding because 20 different car models were used.
The critic’s claim that there may be confounding with the car model is groundless.
The investigators have controlled for potential confounding of the focal variate
(MMT) with car model by blocking and looking at differences in CO emission levels
within each block (i.e., within pairs of cars of the same model). Since the value of the
explanatory variate (model) remained fixed as the focal variate (MMT/clear) changed
across units in each block, car model cannot be confounded with MMT level. ...
View Full Document
- Spring '10