HW 1
1) Assign a set of possible letters to every vertex, traversing the tree
from leaves to root
1. Each nodes set is the union of its childrens sets (leaves contain their
label) if they are disjoint
a. E.g. if the node we are looking at has a left child
Study Aid 2
However, there are two different coins
A Fair coin: Heads and Tails
with same probability .
The Biased coin:
Heads with prob. ,
Tails with prob. .
The Fair Bet Casino (contd)
Thus, we define the probabilities:
P(H|Fair) = P(T|Fair) =
P(H
Study Aid 1
Find the most probable coin that the dealer was using at a particular time.
Forward Algorithm
Defined fk,i (forward probability) as the probability of emitting the
prefix x1xi and reaching the state = k.
The recurrence for the forward algorith
Lecture 4 Notes
Keyword Trees: Threading
Threading is complete when we reach a leaf in the keyword tree
When threading is complete, weve found a pattern in the text
Suffix Trees=Collapsed Keyword Trees
All suffixes of a given sequence
Similar to keyword t
Lecture 3 Notes
Collapse non-branching paths into an edge
(path compression)
Suffix Trees: Advantages
With careful bookkeeping a tests suffix tree can be constructed in a single pass of
the text
Thus, suffix trees can be built faster than keyword trees of
Lecture 2 Notes
Pattern Matching
What if, instead of finding repeats in a genome, we want to find all positions of a
particular sequences in given sequence?
This leads us to a different problem, the Pattern Matching Problem
Pattern Matching Problem
Goal:
Lecture 1 Notes
Repeat Finding
Example of repeats:
ATGGTCTAGGTCCTAGTGGTC
Motivation to find them:
Phenotypes arise from copy-number variations
Genomic rearrangements are often associated with repeats
Trace evolutionary secrets
Many tumors are charact
HW 3
1. Weighted Small Parsimony Problem: Formulation
2. Input: Tree T with each leaf labeled by elements of a k-letter alphabet
and a k x k scoring matrix (ij)
3. Output: Labeling of internal vertices of the tree T minimizing the
weighted parsimony score
HW 2
1) labelings of internal vertices
2) Large Parsimony Problem (cont.)
3) Possible search space is huge, especially as n increases
4) How many rooted binary trees with n leafs?
5) T(n) for 2, 3, 4, 5, 6, 7, 8, 9, 10,
a.i. 1, 3, 15, 105, 945, 10395, 13
Study Aid 3
One method of performing sequence comparisons to a profile is to use a HMM
Emission probabilities, ei(a), from the profile
Transition probabilities from our match-mismatch matrix ij.
Or we can explicitly represent the insertion and deletion st