# lecture 7 - Cluster Analysis Prof Thomas B Fomby Department...

• 10

This preview shows pages 1–3. Sign up to view the full content.

1 Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 75275 April 2008 April 2010 Cluster Analysis , sometimes called data segmentation or customer segmentation , is an unsupervised learning method . As you will recall a method is an unsupervised learning method if it doesn’t involve prediction or classification. The major purpose of Cluster Analysis is to group together collections of objects (e.g. customers) into “clusters” so that the objects in the clusters are “similar.” One reason a company might want to organize its customers into groups is to come to better understand the nature of its customers. Given the delineation of its customers into distinct groups, the company could advertise differently to its distinct groups, send different catalogues to its distinct groups, and the like. In terms of building prediction and classification models, cluster analysis can help the analyst identify groups of input variables that in turn can lead to different models for each group. This is, of course, assuming that the output relationships vis-à-vis the input variables across the groups are not the same. But then one can always test the “poolability” of the models by either conventional hypothesis tests, when considering econometric models, or accuracy measures across validation and test data partitions when considering machine learning models. As one will come to understand after working on several clustering projects, clustering is an “Art Form.” It mu st be practiced with care. The more experience you have in doing cluster analysis, the better you become as a practitioner. Before beginning cluster analysis it is often recommended that the data be normalized first. Cluster analysis based on variables with very different scales of measurement can lead to clusters that are not very robust to adding or deleting variables or observations. In this discussion, we will be focusing on clustering only continuous input variables . The clustering of mixed data, some continuous and some categorical, is not considered here as it is beyond the scope of this discussion. Now let us begin. There are two basic approaches to clustering: a) Hierarchical Clustering (Agglomerative Clustering discussed here) b) Non-hierarchical clustering (K-means)

This preview has intentionally blurred sections. Sign up to view the full version.

2 Hierarchical Clustering With respect to hierarchical clustering, the final clusters chosen are built in a series of steps. If we start with N objects, each being in its own separate cluster, and then combine one of the clusters with another cluster resulting in N 1 clusters and continue to combine clusters into fewer and few clusters with more and more objects in each cluster, we are engaging in Agglomerative clustering . In contrast, if we start with all of the objects being in a single cluster and then remove one of the objects to form a second cluster and then continue to build more and more clusters with fewer and few objects in each cluster until each object is in its own cluster, we are engaging in Divisive
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern