{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

SHYU_edward_poster_2005

# SHYU_edward_poster_2005 - Population Substructure using...

This preview shows page 1. Sign up to view the full content.

Clustering Overview Algorithm Begin with all sequences in one cluster While splitting some cluster improves the objective function: { Split each cluster in two so that the objective function has the greatest possible improvement at that step Reassign individual sequences into the cluster while doing so improves the objective function } The Objective Function I( X i ; X j |C ) i<j - I ( X ; C ) We want to successively minimize this term at each iteration Measures the mutual information between pairs of positions within each cluster Measures the mutual information between a sample of each position within a cluster and the overall distribution of values of these positions. ß These terms “work against” each other to approach a steady state after several iterations ß is a factor to adjust the relative importance of the terms. Independent Sites If several populations were placed in one group, then knowing the value of one position would provide information about the value of another position (there would be mutual information between positions). This is because each subpopulation has certain sets of variants that are more common to it than to other populations A samples i j A A A A C A B samples T T T T G T C A G T If samples from two populations were mixed together, knowing that position i is value A or C tells us that position j is probably A, and knowing that i is T or G provides information that position j is probably T. If individuals were separated into their own subpopulations, knowing the value of one position does not provide any more information about the value at another position (so there is no mutual information between positions).
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Ask a homework question - tutors are online