Lecture 15
Flip-Update MC
0
o Initialization: Choose a haplotype
H
o Iteration: For t=1,2, , obtain
H t +1 from
arbitrarily
Ht
1) Choose a position i cfw_1, , n randomly
2) Let
Hi
3) Set
H
t
t +1
as follows:
be the haplotype formed by flipping the locati

Lecture 01
From an Individual to a Population
o It took a long time (10-15 years) to produce the draft sequence of the human
genome
o Soon (within 10-15 years), entire populations can have their DNA sequenced
Why do we care?
o Individual genomes vary by

Lecture 03
Perfect Phylogeny and Phylogeography
o The Y-chromosome (or equally mitochondrial DNA) lineage is a tree
o Each individual had to get their Y chromosome from his father (or with mtDNA,
from the mother)
o Each individual is a node (samples we ha

Lecture 02
Association Studies
o Take affected individuals from different backgrounds as well as unaffected
o
o
o
o
individuals who are similar backgrounds to the different affected individuals
Problem 1: There are many unrelated common mutations (~1 ever

Lecture 16
Continuous Outcome Statistical Test
o Say we have phenotypes that are like continuous variables
E.g. blood-pressure measurements
o One test we could do to see if different genotypes (or different sub-categories of
the population) are distinct

Lecture 14
Dimensionality Reduction
o Goal: Take a high-dimensional matrix (M SNPs by N individuals SNP matrix, for
example) and reduce it to a lower-dimensional matrix (k SNPs by N individuals
SNP matrix, for example)
We want to have our variance remain

Lecture 09
Combinatorial Formulation
o Given a SNP matrix (n individuals, m sites), determine if there exist at least
n1 <n individuals and m1< m sites such that the n1 individuals are
identical when restricted to the m1 sites
o This problem is NP-Hard fo

Lecture 05
Coalescent Theory
o What Weve Been Doing:
N = number of individuals in population
n = sample size to sample the number of haplotypes, where n < N
If we create simulations of large N over many generations, this is a
computationally intense ta

Lecture 12
Primer Design Simulated Annealing
o Start with some current solution, with cost proportional to uncovered region
o A neighboring solution is obtained by adding a primer, and deleting all its
dimerizing partners
o Goal is to go through the space

Lecture 10
Integer Linear Programming
T
o Recall that we have a linear objective c x
and linear constraints
Ax b
o Geometrically, the set of linear constraints defines a polytope
o The optimum answer lies on a vertex of the polytope
o In ILP, the optima l

Lecture 13
Sampling from Complex Distribution
o Say we have some distribution X for which we have the PDF f ( x )
o How can we randomly sample from X in a manner weighted by the PDF?
0
0) Say an initial point x is given
1) Pick a point
x
'
from the unifo

Lecture 07
Basic Principles of Selection
o Most offspring are produced than can survive
o Different offspring have different levels of fitness
o Fit individuals are more likely to survive and pass on their genotypes
o If a mutation is deleterious, it is q

Lecture 11
Breakpoints
o Breakpoint: A pair of points (a, b) that are distant in the reference genome that
come together in the query
Paired-end mapping of a disjoint read-pair does not pinpoint the exact
breakpoint (just a general idea)
o Denote the fir

Lecture 04
LD Over Time and Distance
o The number of recombination events between two sites can be assumed to be
Poisson distributed
o Let r denote the recombination rate between two adjacent sites
o r = # crossovers per bp per generation
o The recombinat

Lecture 06
Recombination
o We ignore the case of multiple (>1) events in one generation
Pr ( no recombination )=1kr
o
o
( )
k
(
2)
Pr ( no coalescence )= 1
2N
Simulating ARG
o Let k = n
o Define =4 rN
o Iterate until k = 1:
k k
+
2 2
()
Choose time from a

Lecture 08
Scaled SFS (Scaled Frequency Spectrum) of Neutral Evolution
o Recall Scaled Frequency i i
i
is the number of sites with exactly i derived alleles
o
At the beginning (top-left diagram), the selection sweep is not as strong of
a signal because th

Lecture 17
Detecting Multiple Loci
o The most nave strategy is to look at all pairs of loci (or all k-tuples) that influence
a complex disease
This is computationally intensive and also has a problem with multiple
testing
o Other Strategies:
Consider a