Assignment Name - Analytics Advanced 1. On what basis we choose data scaling method (Normalization/Standardization)? 1. Normalization 2. Standardization/Z score Method Normalization: Normalizing attribute data is used to rescale components of a feature vector to have the complete vector length of 1. This usually means dividing each component of the feature vector by the Euclidean length of the vector but can also be Manhattan or other distance measurements. This pre-processing rescaling method is useful for sparse attribute features and algorithms using distance to learn such as KNN. Sensitive to outliners. Standardization/Z score Method: Standardizing attribute data assumes a Gaussian distribution of input features and "standardizes" to a mean of 0 and a standard deviation of 1. This works better with linear regression, logistic regression and linear discriminate analysis. Python StandardScaler class in scikit-learn works for this. Works well with the data which is uniformly distributed. Z= (point-mean)/standard deviation 2. If the VIF is 2 then what is value of correlation coefficient (r^2)?
You've reached the end of your free preview.
Want to read all 11 pages?