Clustering Overview
Algorithm
Begin with all sequences in one cluster
While splitting some cluster improves the
objective function:
{
Split each cluster in two so that the
objective function has the greatest
possible improvement at that step
Reassign individual sequences into the
cluster while doing so improves the
objective function
}
The Objective Function
I( X
i
; X
j
C )
∑
i<j

I ( X ; C )
We want to successively minimize this
term at each iteration
Measures the mutual
information between
pairs of positions
within each cluster
Measures the mutual
information between a
sample of each
position within a
cluster and the overall
distribution of values
of these positions.
ß
These terms “work against” each other to
approach a steady state after several
iterations
ß
is a factor to adjust the relative
importance of the terms.
Independent Sites
If several populations were placed in one
group, then knowing the value of one
position would provide information about
the value of another position (there would
be mutual information between positions).
This is because each subpopulation has
certain sets of variants that are more
common to it than to other populations
A
samples
i
j
A
A
A
A
C
A
B
samples
T
T
T
T
G
T
C
A
G
T
If samples from two
populations were mixed
together, knowing that
position i is value A or C
tells us that position j is
probably A, and
knowing that i is T or G
provides information
that position j is
probably T.
If individuals were separated into their
own subpopulations, knowing the value
of one position does not provide any
more information about the value at
another position (so there is no mutual
information between positions).
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Dr.Ping
 Computer Science, DNA, Evolution, San Diego, Mutual Information, Engineering University of California

Click to edit the document details