PRACHI PATELASSIGNMENT:-2DSC 441Problem 1 (10 points): This problem is an example of data preprocessing needed in a data mining process. Suppose that a hospital tested the age and body fat data for 18 randomly selected adults with the followingresults:Age262629294045505560%fat10.530.58.820.832.426.930.430.233.2Age554560556162637566%fat36.644.530.835.418.104.22.1683.237.7a.(2 points) Draw the box-plots for age and %fat. Interpret the distribution of the data.1
PRACHI PATELBased on the descriptive statistics and boxplot for the Age variable, we can conclude that Age is skewed to the left.Based on the descriptive statistics and boxplot of the %fat variable, we could identify two outliers in the data. Points8.8 and 10.5 are outliers. b.(2 points) Normalize the two attributes based on z-score normalization.Z-score normalization can be calculated using the following formula: v'=v−meamstd.From the descriptive statistics, we know that the mean and standard deviation for age are50.11 and 14.9 respectively. We also know the mean and standard deviation for %fat is31.06 and 9.54 respectively. Using SPSS we can calculate the z-scores for each case quickly.c.(2 points) Regardless of the original ranges of the variables, normalization techniques transformthe data into new ranges that allow to compare and use variables on the same scales. What are thevalues ranges of the following normalization methods? Explain your answer.i.Min-max normalizationRange [new_min, new_max] = [0, 1]. 2
PRACHI PATELWhen using the min-max normalization, the values are forced into a specific range. Theadvantage of this method is when outliers are present in the data.