# DSC441 ASSIGNMENT2.docx - PRACHI PATEL ASSIGNMENT-2 DSC 441...

• 7

This preview shows page 1 - 4 out of 7 pages.

PRACHI PATEL ASSIGNMENT:-2 DSC 441 Problem 1 (10 points): This problem is an example of data preprocessing needed in a data mining process. Suppose that a hospital tested the age and body fat data for 18 randomly selected adults with the following results: Age 26 26 29 29 40 45 50 55 60 %fat 10.5 30.5 8.8 20.8 32.4 26.9 30.4 30.2 33.2 Age 55 45 60 55 61 62 63 75 66 %fat 36.6 44.5 30.8 35.4 33.2 36.1 37.9 43.2 37.7 a. (2 points) Draw the box-plots for age and %fat. Interpret the distribution of the data. 1
PRACHI PATEL Based on the descriptive statistics and boxplot for the Age variable, we can conclude that Age is skewed to the left. Based on the descriptive statistics and boxplot of the %fat variable, we could identify two outliers in the data. Points 8.8 and 10.5 are outliers. b. (2 points) Normalize the two attributes based on z-score normalization. Z-score normalization can be calculated using the following formula: v ' = v meam std . From the descriptive statistics, we know that the mean and standard deviation for age are 50.11 and 14.9 respectively. We also know the mean and standard deviation for %fat is 31.06 and 9.54 respectively. Using SPSS we can calculate the z-scores for each case quickly. c. (2 points) Regardless of the original ranges of the variables, normalization techniques transform the data into new ranges that allow to compare and use variables on the same scales. What are the values ranges of the following normalization methods? Explain your answer. i. Min-max normalization Range [new_min, new_max] = [0, 1]. 2
PRACHI PATEL When using the min-max normalization, the values are forced into a specific range. The advantage of this method is when outliers are present in the data.
• • • 