# Analy 500 Final- Question 3.docx - 1 Beria Anisha Analytics...

• 12

This preview shows page 1 - 4 out of 12 pages.

1 Beria, Anisha Analytics 500- Final Exam Question 3 2020-12-07 ANOVA: (1) Capture an ANOVA model on the iris dataset. Set the dependent variable to ‘Species’. (2) Capture a summary. (3) Provide an interpretation of the results in your own words. Support your response with results captured from running ANOVA. Loading Dataset: Iris = read.csv ( "/Users/anisha/Downloads/iris_Iriss\ $$2$$.csv”, header = TRUE) names (Iris) ## [1] "id" "Species" "Sepal.Length" "Sepal.Width" "Petal.Length" ## [6] "Petal.Width" str (Iris) ## 'data.frame': 300 obs. of 6 variables: ## $id : chr "S001" "S002" "S003" "S004" ... ##$ Species : chr "setosa" "setosa" "setosa" "setosa" ... ## $Sepal.Length: num 4.75 5.07 5.24 5.48 4.9 ... ##$ Sepal.Width : num 3.3 3.68 3.44 3.96 2.81 ... ## $Petal.Length: num 1.44 1.21 1.59 1.53 1.49 ... ##$ Petal.Width : num 0.235 0.111 0.405 0.272 0.345 ... head (Iris) ## id Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## 1 S001 setosa 4.746510 3.301532 1.441511 0.2348507 ## 2 S002 setosa 5.072022 3.678133 1.208144 0.1114255 ## 3 S003 setosa 5.241044 3.442049 1.585426 0.4054026 ## 4 S004 setosa 5.475311 3.960215 1.533434 0.2724266 ## 5 S005 setosa 4.900481 2.806450 1.486378 0.3452578 ## 6 S006 setosa 5.580621 3.857734 1.875316 0.3060997 dim (Iris) ## [1] 300 6 #Dataset contains 300 observation and 6 variables.
2 Beria, Anisha #Data Screening to check for accuracy, missing, outliers. library (mice) ## ## Attaching package: 'mice' ## The following objects are masked from 'package:base': ## ## cbind, rbind summary (Iris) ## id Species Sepal.Length Sepal.Width ## Length:300 Length:300 Min. :4.417 Min. :1.796 ## Class :character Class :character 1st Qu.:5.209 1st Qu.:2.720 ## Mode :character Mode :character Median :5.844 Median :2.992 ## Mean :5.857 Mean :3.064 ## 3rd Qu.:6.448 3rd Qu.:3.375 ## Max. :8.478 Max. :4.810 ## Petal.Length Petal.Width ## Min. :1.135 Min. :-0.03371 ## 1st Qu.:1.566 1st Qu.: 0.30278 ## Median :4.228 Median : 1.28776 ## Mean :3.738 Mean : 1.19830 ## 3rd Qu.:5.205 3rd Qu.: 1.87452 ## Max. :6.955 Max. : 2.62487 Iris $Species = as.factor (Iris$ Species) Iris[, 6 ][Iris[, 6 ] < 0 ] = NA percentmiss = function (x){ sum ( is.na (x)) / length (x) * 100 } apply (Iris, 2 ,percentmiss) ## id Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 missing = apply (Iris, 1 ,percentmiss) table (missing) ## missing ## 0 16.6666666666667 ## 299 1 replace = subset (Iris,missing <= 20 ) missing1 = apply (replace, 1 ,percentmiss) replace_col = replace[, - c ( 1 , 2 )]
3 Beria, Anisha dontcol = replace[, c ( 1 , 2 )] replacevalue = mice (replace_col) ## ## iter imp variable ## 1 1 Petal.Width ## 1 2 Petal.Width ## 1 3 Petal.Width ## 1 4 Petal.Width ## 1 5 Petal.Width ## 2 1 Petal.Width ## 2 2 Petal.Width ## 2 3 Petal.Width ## 2 4 Petal.Width ## 2 5 Petal.Width ## 3 1 Petal.Width ## 3 2 Petal.Width ## 3 3 Petal.Width ## 3 4 Petal.Width ## 3 5 Petal.Width ## 4 1 Petal.Width ## 4 2 Petal.Width ## 4 3 Petal.Width ## 4 4 Petal.Width ## 4