# Final-Exam---Question-2.docx - Final Exam Question 2 Van...

• 7
• 100% (21) 21 out of 21 people found this document helpful

This preview shows page 1 - 3 out of 7 pages.

Final Exam - Question 2 Van Thu Nguyen 2020-08-10 Perform data screening making sure to check for accuracy, missing, outliers. Loading Data data= read.csv ( '/Users/vnguy/Documents/Harrisburg University/ANLY 500/iris_exams.csv' ) df = data.frame (data) str (df) ## 'data.frame': 300 obs. of 6 variables: ## \$ id : chr "S001" "S002" "S003" "S004" ... ## \$ Species : chr "setosa" "setosa" "setosa" "setosa" ... ## \$ Sepal.Length: num 4.75 5.07 5.24 5.48 4.9 ... ## \$ Sepal.Width : num 3.3 3.68 3.44 3.96 2.81 ... ## \$ Petal.Length: num 1.44 1.21 1.59 1.53 1.49 ... ## \$ Petal.Width : num 0.235 0.111 0.405 0.272 0.345 ... Cleaning up data: Check for accuracy, missing value and outlier df \$ Species = as.factor (df \$ Species) df[, 6 ][df[, 6 ] < 0 ] = NA percentmiss = function (x){ sum ( is.na (x)) / length (x) * 100 } apply (df, 2 ,percentmiss) ## id Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 missing = apply (df, 1 ,percentmiss) replace = subset (df, missing <= 20 ) missing1 = apply (replace, 1 ,percentmiss) replace_col = replace[, - c ( 1 , 2 )] dont_col = replace[, c ( 1 , 2 )] library (mice) ## ## Attaching package: 'mice' ## The following objects are masked from 'package:base': ## ## cbind, rbind
replace_value = mice (replace_col) ## ## iter imp variable ## 1 1 Petal.Width ## 1 2 Petal.Width ## 1 3 Petal.Width ## 1 4 Petal.Width ## 1 5 Petal.Width ## 2 1 Petal.Width ## 2 2 Petal.Width ## 2 3 Petal.Width ## 2 4 Petal.Width ## 2 5 Petal.Width ## 3 1 Petal.Width ## 3 2 Petal.Width ## 3 3 Petal.Width ## 3 4