Michael Henry Intro to Data Science DSC 441 Assignment #2 1a
The fat % variable has 2 outlier variables while the Age variable has no outliers. 1b. Age Z-Score Normalization = X-50.11/14.89 Age: [-1.62, -1.62, -1.42, -1.42, -0.68, -0.34, -0.01, 0.33, 0.66, 0.33, -0.34, 0.66, 0.33, 0.73, 0.79, 0.87, 1.67, 1.07] Fat % Z-score Normalization = X-31.06/9.54 Fat %: [-2.16, -0.06, -2.33, -1.08, 0.14, -0.44, -0.07, -0.09, 0.22, 0.58, 1.41, -0.03, 0.45, 0.22, 0.53, 0.72, 1.27, 0.69] 1c. I Min-Max Normalization: [ -1.0, 1.0 ] II Z-Score Normalization: [- infinity, infinity] III Normalization by Decimal Scaling: [ -1, 1 ]
1d. The perceived relationship between the two variables is the older the person, the more fat % that person has. 2 a. (2.5 points) equal-depth partitioning with 4 values per bin ID Record ID Record 1 8 1 12 2 13 2 12 3 14 3 12 4 15 4 23 5 17 5 23 6 37 6 23 7 55 7 64 8 60 8 64 9 77 9 64 10 95 10 174 11 208 11 174 12 218 12 174 Bin 1 Bin 2 Bin 3 Bin 4
b. (2.5 points) equal-width partitioning with 4 bins W= max-min/N = 218-8/4 = 210/4 = 52.5 (53) 1 st bin = 53 + 8 = 61 (8-61) 2 nd bin = 61 + 53 = 114 (61-114) 3 rd bin = 114 + 53 =167 (114-167) 4 th bin = 167 +53 = 220 (167-220) 1 st bin = 8, 13, 14, 15, 17, 37, 55, 60 2 nd bin = 77, 95 3 rd bin = 4 th bin = 208, 218