STAT 420 Fall 2009 Homework #2 (10 points) (due Friday, September 11, by 3:00 p.m.) 1. The friendly folks at the Internal Revenue Service (IRS) are always looking for ways to improve the wording and format of its tax return forms. Three new forms have been developed recently. To determine which, if any, are superior to current form, 120 individuals were asked to participate in an experiment. Each of the three new forms and the currently used form were filled out by 30 different people. The amount of time (in minutes) taken by each person to complete the task was recorded and stored in columns 1 through 4 (forms Form1 through Form4, respectively) in file Hw02_1.csv. The data set is available at http://www.stat.uiuc.edu/~stepanov/Hw02_1.csv To create a data frame in R, use Hw02_1 = read.table("http://www.stat.uiuc.edu/~stepanov/Hw02_1.csv", sep=",", header=T) sep="," indicates that the data in the data file are separated by a comma, header=T indicates that the first line of the data file contains the names for the variables (as opposed to header=F ) You can then access individual variables in the data frame Hw02_1 by using Hw02_1\$Form1 , Hw02_1\$Form2 , Hw02_1\$Form3 , and Hw02_1\$Form4 . For example, to combine the four 30-component data sets (one for each form) into one 120-component data set, use Time = c(Hw02_1\$Form1, Hw02_1\$Form2, Hw02_1\$Form3, Hw02_1\$Form4) Then use Form = c(rep(1,30), rep(2,30), rep(3,30), rep(4,30)) a) Test for differences in average time required to fill these four forms using the ANOVA F test. (i) Specify the null and the alternative hypotheses. (ii) What are the required conditions (assumptions) for this test? (iii) Show the calculations leading to your conclusion in the form of an ANOVA table. (iv) What conclusions can be drawn from these data? Use α = 0.05.

H 0 : μ 1 = μ 2 = μ 3 = μ 4 , where μ j = the average time required to fill Form j . H a : at least two of μ j ’s are different. OR H 0 : τ 1 = τ 2 = τ 3 = τ 4 = 0. H a : Not H 0 . (ii) The time required to fill Form j is normally distributed with mean μ j and common variance σ 2 , j = 1, 2, 3, 4. Our data are four independent random samples from these four populations. Y i j = μ j + ε i j , i = 1, 2, … , 30, j = 1, 2, 3, 4, where ε i j ’s are i.i.d. N ( 0, σ 2 ). OR Y i j = μ + τ j + ε i j , i = 1, 2, … , 30, j = 1, 2, 3, 4, where ε i j ’s are i.i.d. N ( 0, σ 2 ), = 4 1 τ j j = 0. (iii) > Hw02_1 <- read.table("http://www.stat.uiuc.edu/~stepanov/Hw02_1.csv", sep=",", header=T) > Time <- c(Hw02_1\$Form1, Hw02_1\$Form2, Hw02_1\$Form3, Hw02_1\$Form4) > Form <- c(rep(1,30), rep(2,30), rep(3,30), rep(4,30)) > summary(aov(glm(Time ~ factor(Form)))) Df Sum Sq Mean Sq F value Pr(>F) factor(Form) 3 8464 2821 2.9358 0.03632 * Residuals 116 111480
## This note was uploaded on 12/17/2010 for the course STAT 420 taught by Professor Stepanov during the Spring '08 term at University of Illinois, Urbana Champaign.

