# 21321 clustering algorithms there are several

This preview shows pages 229–231. Sign up to view the full content.

21.3.2.1. Clustering Algorithms There are several clustering algorithms. The popular algorithms are divided into two types Non-hierarchical clustering and hierarchical clustering. These methods are explained below. 21.3.2.1.1. Non-hierarchical Clustering This method is also known as k -mean clustering. The steps involved in this method are described below: a. Fix k number of clusters in advance. b. Take the initial guess of k cluster centroids or means. The popular approach of this is to take the observations (or cases) having highest distances. c. Proceed through the list of cases assigning a case to the cluster whose centroid is the nearest. Recalculate the centroid is the nearest. Recalculate the centroid of new cluster receiving the new case and the cluster losing an old case. d. Repeat step (c) until the reassignment of cases occurs. The final assignment of cases to clusters depends on the initial guess of k means. In order to check the stability of the clustering, it is desirable to run the algorithm again with a new initial guess. How to select the number of clusters k is a big question. It is decided by the researcher what value of k yields the “best” solution. We suggest repeating the procedure with different values of k and comparing the solutions. When the number of clusters is smaller, the solution looks simplex. However the adequacy of the solution may increase with the number of clusters. 21.3.2.1.2. Hierarchical Clustering There are two methods of hierarchical clustering divisive and agglomerative. In divisive method, all the cases or observations are first put in one cluster. At each stage the cluster is divided into more clusters. It ends up having as many clusters as the cases are. We start having one cluster and end with n clusters,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
224 where n is the number of cases or observations. In this method, once a cluster is divided into two or more clusters, they cannot be merged. In agglomerative method, we first consider each case as a cluster. Then closest (most similar) clusters are combined (agglomerated). It ends up having only one cluster of all the cases. Thus, we start with n clusters and end with one cluster. Once the clusters are merged, they cannot be split. In these methods, first and last iterations are not the best solutions. We need to know the number of clusters we require. There is no right or wrong answer as to how many clusters we should have. To find a good number of clusters, we should study the solution at each step and decide a reasonable number of homogeneous clusters representing the data. We also require a suitable distance measure and a criterion to decide which clusters are divided or merged at successive steps. 21.3.2.2. Agglomerative Clustering In this section, we describe agglomerative clustering procedure in detail. For agglomerative clustering, we need i. A suitable distance measure ii. A criterion to combine the clusters iii. Number of clusters We have already discussed about some distance measures in Section 17.2. It is advisable to standardize the data so that no variable is under over-represented.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern