Are cluster assignments in partition B similar to partition A Separation check

Are cluster assignments in partition b similar to

This preview shows page 2 out of 2 pages.

are clusters and cluster assignments sensitive to slight changes in inputs? Are cluster assignments in partition B similar to partition A? Separation check ratio of between-cluster variation to within-cluster variation (higher is better) Non hierarchical, k cluster: 1. Choose # of clusters desired, k 2. Start with a partition into k clusters Often based on random selection of k centroids 1. At each step, move each record to cluster with closest centroid 2. Recompute centroids, repeat step 3 3. Stop when moving records increases within-cluster dispersion Choose k based on the how results will be used e.g., “How many market segments do we want?”Also experiment with slightly different k ’sInitial p artition into clusters can be random, or based on domain knowledge so If random partition, repeat the process with different random partitions Preprocess- Get the data ready for analysis Deal with Missing Values Address any measurement error Rescale (e.g., Normalize) the Data Reduces dispersion of data points by re-computing the distance Preserves differences while dampening the effect of the outliers Remove Outliers Reduces dispersion of data points by removing atypical data They don’t represent the popula tion anyway Big field of study now in data mining (has applications for fraud detection, discovery of blockbuster drugs in pharmaceuticals) Standardization adjusts the intervals of attributes to a common range (also known as min-max scaling ) Calculate standardized values in interval [0,1] q = (x min) / (max min) x original value of the attribute min/max smallest/largest value of the attribute q resulting (scaled) value of the attribute in the range [0,1] Normalization ( z-score ) shift values to a normal curve with mean 0 and variance 1. z = (x m) / s m mean value of the attribute s standard deviation (or mean absolute deviation) One may also consider weighted distance metrics K- means cluster results: Sum-of-Squares Error (SSE) The distance to the nearest cluster center How close does each point get to the center? 𝑆𝑆𝐸 = ∑ 𝑑(𝑚 𝑖 ,𝑥) 2 𝑥∈𝐶 𝑖 𝐾 𝑖=1 This just means: In a cluster i , compute distance from a point x to the cluster center m i . Square the distance (so sign is not an issue) Add them all together. Choosing best initial centroid- There is no Single, Best Way to Choose Initial Centroids; Multiple runs; Use a subsample first and then apply it to your main data set; Select more centroids to start with, then choose the ones that are farthest apart (most distinct); Pre- and post-processing of the data Post-Processing: Better Centroids “Post”: Interpreting the Results of the Cluster Analysis ; Remove small clusters- May be outliers; Split loose clusters- With high SSE that look like they are really two different groups; Merge clusters- With relatively low SSE that are “close” together Limitations of k-Means Clustering K- Means gives unreliable results when… Clusters vary widely in size, Clusters
Image of page 2
  • Fall '14

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask 0 bonus questions You can ask 0 questions (0 expire soon) You can ask 0 questions (will expire )
Answers in as fast as 15 minutes