Equal width binning method does not work well because

This preview shows page 5 - 7 out of 7 pages.

Problem 3 (10 points): a)(2 points) Figure 1 illustrates the plots for some data with respect to two variables: balance andemployment status. If you have to select one of these two variables to classify the data into two classes(circle class and plus class), which one would you select? Is there any approach/criterion that you can useto support your selection? Explain your answer.
b)(8 points) For the data in Figure 2 with three variables (X, Y, and Z) and two classes (I and II): which variable you would choose to classify the data? Show all the steps of your calculations and interpret your answer. XYZĈ111I111I
DSC 441: Fall 2018-2019 Assignment 2, Page 6 of 7001II100IIFigure 2: Data for Problem 3.bVariable Y would be the best variable to classify the data because it has the highest accuracy rate. Variables X and Z both have a 75% accuracy rate while variable Y has an accuracy rate of 100%.XCĈYCĈZCĈ1II1II1II1II1II1II0IIII0IIII1III1III0IIII0IIIIProblem 4 (10 points): Download the Spotify Dataset along with the description from D2L. a)(5 points) Describe the data in terms of number of attributes, number of cases, class distribution. Is thereany correlation between features? Explain your answer.Number of attributes13Number of cases1420Class distributionMoodCountdinner467dinner, party3dinner, workout1party225party, workout52sleep362workout310There are several variables that are correlated in this dataset. Correlation has been identified by having acorrelation coefficient of > |0.5|.The following variables have positive correlation:Danceability and loudnessDanceability and valence Energy and loudnessInstrumentalness and acousticnessThe following variables have negative correlations:

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture