for each value of the variable the variables mean is subtracted and the result

For each value of the variable the variables mean is

This preview shows page 3 - 5 out of 5 pages.

for each value of the variable, the variable’s mean is subtracted and the result is divided by the standard deviation so that the resulting variable has mean 0 and standard deviation 1. [This means that the standardised variable values are measures of how many standard deviations above or below the mean that particular value was.] Standardisation of numerical variables is performed for cluster analysis because the clusters are to be formed based on how different the various variable values are. This should not depend on the scale of the variables. With standardised variables, when measuring the distance between two data points or between two clusters, equal weighting is given to the contribution from each variable. If there is a reason to put more weight on one variable, this can be done by multiplying the standardised variable by an appropriate factor. (ii) grps is first defined in line 5 as an empty matrix with 9 rows and two columns. There is then a loop, which for each value of n from 2 to 10, applies the k-means command to Utils thus dividing the data into n clusters. Then n is placed in the (n-1) row and first column, while the between cluster variance as a proportion of the total variance goes in the (n-1) row and second column. Thus we will plot the proportion of between variance as a function of the number of clusters. Since we would like to maximise the between cluster variance, we can use this plot to decide how many clusters it is useful to have. (iii) It seems appropriate to choose 6 clusters, as when we go beyond 6, the increase in between- clusters proportion is minimal. (b) (i) Linkage methods are methods of defining the distance between clusters. They are calculated based on the distance between pairs of points, one in each cluster. For example, single linkage For two clusters C1 and C2, find the pair of points, one in C1 and one in C2, that are the shortest distance apart. Complete linkage: same idea but find the pair of points, one in each cluster, that are the furthest apart. (ii) The shortest distance is AB and EF which are both 2. So create clusters AB and EF. The distance CD is 3 and C and D are more distant from all other points, so form a cluster CD. Now we calculate the distances between these 3 clusters, as determined by complete linkage. AB to CD: the biggest distance is AD = 12 AB to EF: the biggest distance is AF = 8.5 CD to EF: the biggest distance is DE = 8 so form cluster CDEF at height of 8 Max distance from AB to CDEF is AD = 12 so form cluster ABCDEF at height of 12 (c)
Image of page 3

Subscribe to view the full document.

(i) Simple matching: 6/10 = 0.6 Jaccard: 2/5 = 0.4 (ii)
Image of page 4
Image of page 5
  • '19

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask 0 bonus questions You can ask 0 questions (0 expire soon) You can ask 0 questions (will expire )
Answers in as fast as 15 minutes