Assignment_4_solution

Assignment_4_solution - Stat 231 Assignment 4 Solutions 1.

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Stat 231 Assignment 4 Solutions 1. Methylcyclopentadienyl Manganese Tricarbonyl (MMT) has been added to unleaded gasoline in Canada to help reduce engine knock. The automobile manufacturers claim that the addition of MMT to gasoline increases the amount of carbon monoxide (CO) emitted, measured in gm/km. Consider the target population of all cars in Canada at the current time. a) Using the language of the course, what do we mean by the statement “Adding MMT to gasoline increases the amount of CO emitted.” We mean that adding MMT to gasoline causes an increase in mean CO emission level. That is, if we were to calculate the mean CO emission level of all cars in Canada and, while holding all other explanatory variables fixed, increase the concentration of MMT in the gasoline in all cars, we would see an increase in mean CO emission level. Since all other explanatory variates remained unchanged, we could say that the change in mean CO was caused by the change in MMT. In an investigation to examine this issue, 100 cars were haphazardly selected at Drive Clean centers throughout Ontario over the course of one summer. For each selected car, the required emissions test determined the amount of CO emitted. The researchers determined the concentration of MMT (mg/l) in the car’s fuel tank and also recorded the distance the car had driven to the nearest thousand kilometers, as measured by the odometer. You can find the data with the R code source(‘http://uwangel.uwaterloo.ca/AngelUploadsuwangel/Content/MRG-041122144725-_admin/_assoc/sourceAss4q1.txt ’) The variate names are: MMT: concentration of MMT in mg/l CO: CO emissions (gm/km) distance: thousands of kilometers driven b) Is this Plan observational or experimental? Explain. This is an observational Plan, since the investigators did not have control over or manipulate the value of the focal variate (MMT) for each car in the sample. c) Construct a scatterplot showing the relationship between CO levels and MMT concentrations. Is the graph consistent with the statement that adding MMT to the gasoline increases the amount of CO emitted? Yes, the graph is consistent with this statement. There appears to be a positive (linear) relationship between CO emission and MMT - CO emission increases as MMT increases. d) Fit the model Y j = α + β ( x j − x ) + R j , j = 1,...,100, to the data and add the fitted line to the scatterplot. Estimate the rate of increase in average emitted CO for an increase in MMT concentration of 1 mg/l. ˆ ˆ From the fitted model, α = 0.77230 , β = 0.08892. Recall that in a linear model, β represents the average (mean) change in the response variate for a change of one unit in the explanatory variate. From the fitted model, the estimate of the increase in the average CO emission for an ˆ increase in MMT concentration of 1 mg/l is β = 0.08892 gm/km. e) Plot the estimated residuals rj versus fitted values µ j – see assignment 3 for R code. Is there evidence the model is inadequate? Yes, there is evidence that the model is inadequate, as the standard deviation of the estimated residuals increases as µ j (estimated mean CO) increases. According to the model, the standard deviation of the residuals (σ) should be constant over all units. (Note that this inadequacy can also be seen in the scatterplot of CO vs MMT in d), where the estimated residual rj is the distance of the jth observation from the fitted line and µ j is the associated point on the line.) f) What does it mean if MMT concentration and distance are confounded in their effect on CO emission levels? It means that the effects of MMT concentration and distance on mean CO emission level cannot be separated. This suggests that distance (confounding variate) does not remain fixed as MMT (focal variate) changes, and is related to both MMT and CO (response variate). In this study, it would seem likely that the older the car (larger distance value) the more prone to engine knock (larger MMT), and the poorer it would run (larger CO emission value). In this situation, it is not possible to estimate the effect that each variate (MMT, distance) has on CO emission levels. We say then, that the effects are confounded. g) Construct a plot to see if there is confounding in this instance. Since distance changes (increases) as MMT changes (increases), we cannot determine whether the change in CO is caused by the change in MMT or by the change in distance (or by the change in some other unknown explanatory variate). There does indeed appear to be confounding between MMT and distance. The following 3-dimensional scatterplot of the relationship between the three variates (MMT, distance, CO) suggested in f) is a nice illustration of confounding between MMT and distance in their effects on CO. This plot was produced in R with the ‘scatterplot3d’ command after loading the scatterplot3d package. h) What do you now conclude about the statement “Adding MMT to gasoline increases the amount of CO emitted.”? This statement is misleading, as it implies a causitive relationship between MMT and CO which we cannot substantiate, due to confounding between MMT and distance as revealed in the previous plots. Alternative statements such as “High levels of MMT in gasoline linked to increase in CO emission levels”, or “Cars with high levels of MMT in gasoline have higher CO emission levels” accurately describe the perceived relationship between MMT and CO but do not incorrectly imply causation. 2. In a second investigation, the automobile manufacturers provided two cars from each of 20 different models. One car in each pair was driven using gasoline with 16 g/l of MMT. The second car in each pair used the same fuel but without MMT. The cars were driven together under extreme conditions on a test track for 160,000 km and then the CO emissions were measured. a) Is this an experimental or observational Plan? Explain. This is an experimental Plan, since the investigators manipulated the value of the focal variate ( presence/absence of MMT) on each unit. b) This Plan uses blocking. Explain how the blocking was done and its advantages. A block consists of a pair of cars of the same model driven at the same time under the same conditions, in which MMT was added to the gasoline in one car of each pair and not the other. Blocking this way helps ensure that most potential confounding explanatory variates (e.g., car model, road/environmental conditions, etc.) remain fixed for cars within each block, so that by changing only the focal variate (MMT), differences in CO levels within each block can be considered to be caused by the change in MMT. We say that blocking has ‘controlled for’ these explanatory variates. c) This Plan uses replication. Explain how the replication was done and its advantages. The Data stage was repeated (replicated) over 20 blocks (pairs of cars of the same model). The larger the number of replicates (blocks, in this case) in the sample, the more representative is the sample of the study population, thereby decreasing sample error. In this study, the investigators feel that results from 20 different car models can be inferred to all car models in the population. In terms of the statistical model, the larger the number of replicates, the better our estimate of the standard deviation of the process, σ, resulting in more precise estimates of the model parameters. d) How would you assign the fuel types to the cars within each pair? Why? The fuel types (With/Without MMT) should be assigned randomly to the cars within each pair. Randomization of values of the focal variate within each block helps to ensure that the values of any potential confounding variates not controlled for by blocking (e.g., variates associated with the car drivers) are balanced out between cars within each block. Randomization is another method we have to control for confounding. Consider the model for the CO levels Yij = µ i + β j + Rij , Rij ~ G(0, σ ), independent , where i = 1,2 indexes the fuel type and j = 1,...20 indexes the pairs. e) Express the Problem in terms of model parameters. The Problem addresses the difference in mean CO emission levels between cars with MMT added to the gasoline and cars without MMT added. In terms of the model, this difference can be expressed as µ1 − µ 2 . f) Show that we can examine this difference with the derived model Yj = Y1 j − Y2 j , j = 1,...20 . What is the advantage of working with this model? Y j = Y1 j − Y2 j = ( µ1 + β j + R1 j ) − ( µ 2 + β j + R2 j ) = ( µ1 − µ 2 ) + (β j − β j ) + ( R1 j − R2 j ) = ( µ1 − µ 2 ) + ( R1 j − R2 j ) yielding a model of the form Y j = µ + R j j = 1,2,...,20 where µ is the mean difference in CO emission level between cars within each block and R j is the residual of the difference in CO for the jth pair. The advantage of working with this model is that we do not have to account for the block effects, β j . Since our random variable Y represents the difference in CO emission level between cars of the same model (block), any changes in CO emission levels due to the different car models does not affect our model. The data are available using the R code: source(‘http://uwangel.uwaterloo.ca/AngelUploadsuwangel/Content/MRG-041122144725-_admin/_assoc/sourceAss4q2.txt’) The variates are: pair: an index for the model MMT: the CO level for the MMT fuel car clear: the CO level for the clear fuel car cylinders: the number of cylinders in the engine for the model g) Prepare a plot to examine if there is a difference in CO emissions for the two fuels. Is there evidence that this difference depends on the number of cylinders in the engine? From the boxplot, it appears that there is a difference in CO emission levels for the two fuel types. On average, MMT-added cars appear to have approximately a 0.05 gm/km higher CO emission level than clear cars. A scatterplot of the difference in CO emission levels vs number of cylinders suggests that 8-cylinder cars have a larger difference in CO emission level between the two fuel types than 4- or 6-cylinder cars. h) After the conclusions of the investigation were published, one critic claimed that a limitation was possible confounding because 20 different car models were used. Comment. The critic’s claim that there may be confounding with the car model is groundless. The investigators have controlled for potential confounding of the focal variate (MMT) with car model by blocking and looking at differences in CO emission levels within each block (i.e., within pairs of cars of the same model). Since the value of the explanatory variate (model) remained fixed as the focal variate (MMT/clear) changed across units in each block, car model cannot be confounded with MMT level. ...
View Full Document

Ask a homework question - tutors are online