Assignment - 3.docx - Assignment 3 1 Recitation Problems...

This preview shows 1 out of 4 pages.

Assignment – 3 1. Recitation Problems 1.1. Chapter 8 1.1.1. Given K equally sized clusters, the probability that a randomly chosen initial centroid will come from any given cluster is 1/K, but the probability that each cluster will have exactly one initial centroid is much lower. (It should be clear that having one initial centroid in each cluster is a good starting situation for K-means.) In general, if there are K clusters and each cluster has n points, then the probability, p, of selecting in a sample of size K one initial centroid from each cluster is given by Equation 8.20. (This assumes sampling with replacement.) From this formula we can calculate, for example, that the chance of having one initial centroid from each of four clusters is 4!/44 = 0.0938 (a) PIot the probability of obtaining one point from each cluster in a sample of size K for values of K between 2 and 100. Please find the values computed for all values of K = 2 to 100 in the attached sheet [Tab : Problem 8.4(a)] with the Probability calculated and plotted. Assignment - 3.xlsx Probability of having one point from each cluster in sample of size K is essentially 0 by the time K >= 8
Image of page 1

Subscribe to view the full document.

0 20 40 60 80 100 120 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Probability Probability K Probability (b) For K clusters, K = 10,100, and 1000, find the probability that a sample of size 2K( contains at least one point from each cluster. You can use either mathematical methods or statistical simulation to determine the answer. Given, K – clusters Sample size = 2K Probability that a point doesn’t come from a particular cluster = 1 – 1/K Probability that all points don’t come from a particular cluster = (1 – 1/K) 2K Probability that atleast one of the 2K point comes from a particular cluster = 1 - (1 – 1/K) 2K Probability that a sample of 2K points contains one point from each cluster = (1 - (1 – 1/K) 2K ) K Using the above formula for K = 10 we get: Probability = (1-(1/10) 20 ) 10 = 0.27355 Using the above formula for K = 100 we get: Probability = (1-(1/100) 200 ) 100 = 5.659 e-07 Using the above formula for K = 1000 we get: Probability = (1-(1/1000) 2000 ) 1000 = 8.236 e-64 1.1.2. Suppose that for a data set
Image of page 2
there are m points and K clusters, half the points and clusters are in "more dense" regions, half the points and clusters are in "less dense" regions, and the two regions are well-separated from each other. For the given data set, which of the following should occur in order to minimize the squared error when finding If clusters: (a) Centroids should be equally distributed between more dense and less dense regions. (b) More centroids should be allocated to the less dense region. (c) More centroids should be allocated to the denser region. Note: Do not get distracted by special cases or bring in factors other than density. However, if you feel the true answer is different from any given above, justify your response.
Image of page 3

Subscribe to view the full document.

Image of page 4
You've reached the end of this preview.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern