{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lect22-Clust2

# Lect22-Clust2 - DATA MINING Susan Holmes Stats202 Lecture...

This preview shows pages 1–6. Sign up to view the full content.

. . . . . . DATA MINING Susan Holmes © Stats202 Lecture 22 Fall 2010 A B a b c d f g h i e j kl

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
. . . . . . Special Announcements I Do not update your version of R before the end of the quarter. I All requests should be sent to [email protected] . I A new homework is up: due next Thursday, contains part of the data for the competition (revise version contains hints, here is another one: BreastCancer = na.omit(BreastCancer) ). I Kaggle competition is up, site: http://inclass.kaggle.com/stat202 . I Wednesday : Bring your laptop, with the package cluster installed.
. . . . . . Last time I The big picture: clustering, partitionnal and hierarchical. I k-means algorithm. Algorithm.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
. . . . . . k-medoids algorithm 1. For a given cluster assignment C find the observation in the cluster minimizing total distance to other points in that cluster: i * k = argmin { i : C ( i )= k } C ( i )= k D ( x i , x i ) . Then m k = x i k , k = 1 , 2 , . . . , K are the current estimates of the cluster centers. 2. Given a current set of cluster centers m 1 , . . . , m K , minimize the total error by assigning each observation to the closest (current) cluster center: C ( i ) = argmin 1 k K D ( x i , m k ) . 3. Iterate steps 1 and 2 until the assignments do not change.
. . . . . . Gap Statistic To the extent this scenario is realized, there will be a sharp decrease in successive di?erences in criterion value, W K - W K +1 , at K = K * . That is, { W K - W K +1 | K < K * } >> { W K - W K +1 | K K * } An estimate ˆ K * for K * is then obtained by identifying a 'kink' in the plot of W K as a function of K.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 19

Lect22-Clust2 - DATA MINING Susan Holmes Stats202 Lecture...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online