Final-Exams---Question-1.docx - Final Exam Question 1 Van...

This preview shows page 1 - 4 out of 11 pages.

Final Exam - Question 1 Van Thu Nguyen 2020-08-08 Perform data screening making sure to check for accuracy, missing, outliers. Loading Data data= read.csv ( '/Users/vnguy/Documents/Harrisburg University/ANLY 500/iris_exams.csv' ) df = data.frame (data) str (df) ## 'data.frame': 300 obs. of 6 variables: ## $ id : chr "S001" "S002" "S003" "S004" ... ## $ Species : chr "setosa" "setosa" "setosa" "setosa" ... ## $ Sepal.Length: num 4.75 5.07 5.24 5.48 4.9 ... ## $ Sepal.Width : num 3.3 3.68 3.44 3.96 2.81 ... ## $ Petal.Length: num 1.44 1.21 1.59 1.53 1.49 ... ## $ Petal.Width : num 0.235 0.111 0.405 0.272 0.345 ... 1. Accuracy You can also embed plots, for example: summary (data) ## id Species Sepal.Length Sepal.Width ## Length:300 Length:300 Min. :4.417 Min. :1.796 ## Class :character Class :character 1st Qu.:5.209 1st Qu.:2.720 ## Mode :character Mode :character Median :5.844 Median :2.992 ## Mean :5.857 Mean :3.064 ## 3rd Qu.:6.448 3rd Qu.:3.375 ## Max. :8.478 Max. :4.810 ## Petal.Length Petal.Width ## Min. :1.135 Min. :-0.03371 ## 1st Qu.:1.566 1st Qu.: 0.30278 ## Median :4.228 Median : 1.28776 ## Mean :3.738 Mean : 1.19830
## 3rd Qu.:5.205 3rd Qu.: 1.87452 ## Max. :6.955 Max. : 2.62487 #label species as factor df $ Species = as.factor (df $ Species) #assign values that are less than 0 to missing value df[, 6 ][df[, 6 ] < 0 ] = NA summary (df) ## id Species Sepal.Length Sepal.Width ## Length:300 setosa :100 Min. :4.417 Min. : 1.796 ## Class :character versicolor:100 1st Qu.:5.209 1st Qu.:2.720 ## Mode :character virginica :100 Median :5.844 Median : 2.992 ## Mean :5.857 Mean : 3.064 ## 3rd Qu.:6.448 3rd Qu.:3.375 ## Max. :8.478 Max. : 4.810 ## ## Petal.Length Petal.Width ## Min. :1.135 Min. :0.01781 ## 1st Qu.:1.566 1st Qu.:0.30480 ## Median :4.228 Median :1.29179 ## Mean :3.738 Mean :1.20242 ## 3rd Qu.:5.205 3rd Qu.:1.87538 ## Max. :6.955 Max. :2.62487 ## NA's :1 2. Missing Value Number of data missing by column apply (df, 2 , function (x) sum ( is.na (x))) ## id Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## 0 0 0 0 0 1 Number of missing data by row apply (df, 1 , function (x) sum ( is.na (x)))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture