Lect22-Clust2

Lect22-Clust2 - DATA MINING Susan Holmes Stats202 Lecture...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
. . . . . . DATA MINING Susan Holmes © Stats202 Lecture 22 Fall 2010 ABabcdfghiejkl
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
. . . . . . Special Announcements I Do not update your version of R before the end of the quarter. I All requests should be sent to stats202-aut1011-staff@lists.stanford.edu . I A new homework is up: due next Thursday, contains part of the data for the competition (revise version contains hints, here is another one: BreastCancer = na.omit(BreastCancer) ). I Kaggle competition is up, site: http://inclass.kaggle.com/stat202 . I Wednesday : Bring your laptop, with the package cluster installed.
Background image of page 2
. . . . . . Last time I The big picture: clustering, partitionnal and hierarchical. I k-means algorithm. Algorithm.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
. . . . . . k-medoids algorithm 1. For a given cluster assignment C fnd the observation in the cluster minimizing total distance to other points in that cluster: i * k = argmin { i : C ( i )= k } C ( i )= k D ( x i , x i ) . Then m k = x i k , k = 1 , 2 , . . . , K are the current estimates o± the cluster centers. 2. Given a current set o± cluster centers m 1 , . . . , m K , minimize the total error by assigning each observation to the closest (current) cluster center: C ( i ) = argmin 1 k K D ( x i , m k ) . 3. Iterate steps 1 and 2 until the assignments do not change.
Background image of page 4
. . . . . . Gap Statistic To the extent this scenario is realized, there will be a sharp decrease in successive di?erences in criterion value, W K - W K +1 , at K = K * . That is, { W K - W K +1 | K < K * } >> { W K - W K +1 | K K * } An estimate ˆ K * for K * is then obtained by identifying a 'kink'
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/29/2011 for the course STAT 202 at Stanford.

Page1 / 19

Lect22-Clust2 - DATA MINING Susan Holmes Stats202 Lecture...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online