DSC441: Fundamentals of Data Science Assignment 2 Jesse Johnson 10/12/2020 1. Age + %Fat a. Age and per fat are not skewed but normal. There are two outliers in percent fat, 8.8 and 10.5 both of these come from younger aged people. Boxplot of Age mean is 50. Boxplot of percent fat mean is 31
Boxplot of age and percent fat combined. Age and percent fat is skewed to the left when combined in one boxplot. b. Z-Score normalization can be found using v= (v-mean)/std
c. Normalization i. All features are on the same scale. Min-max has the issue of not being able to handle outliers very well. The minimum value starts at 0 and the maximum value is set to 1. ii. Z score is good for being able to see if something is above or below the mean, below the mean is negative, and above the mean is positive. Handles outliers well, but features might not be all on the same scale. The mean is 0. iii. The decimal scaling is dependent on the maximum value. Based off that we drag the decimal point over so that the data will always be between 0 and 1.
