Ke, Yue 11/3/2008
1. In programme development, types of requirements include
functional, performance, reliability, availability, error handling, interface, and constraints. In developing software for just a single user, one would be able to provide
Study Aid 2
For 40% of sequenced genes, functionality cannot be ascertained using only
comparisons to sequences of other known genes
Microarrays allow biologists to infer gene function even when sequence similarity
alone is insufficient to infer function
Study Aid 1
How Gibbs Sampling Works
1) Randomly choose starting positions
s = (s1,.,st) and form the set of l-mers associated
with these starting positions.
2) Randomly choose one of the t sequences.
3) Create a profile P from the other t -1 sequences.
4
Lecture Notes 4
Randomized Algorithms
Randomized algorithms incorporate random, rather than deterministic, decisions
Commonly used in situations where no exact and/or fast algorithm is known
Main advantage is that no input can reliably produce worst-case
Lecture Notes 3
Fitting Distance Matrix
Given n species, we can compute the n x n distance matrix Dij
Evolution of these genes is described by a tree that we dont know.
We need an algorithm to construct a tree that best fits the distance matrix Dij
Fittin
Lecture Notes 2
Around the time the giant panda riddle was solved, a DNA-based reconstruction
of the human evolutionary tree led to the Out of Africa Hypothesis that claims
our most ancient ancestor lived in Africa roughly 200,000 years ago
Largely based
Lecture Notes 1
Clique Graphs
A clique is a graph where every vertex is connected via an edge to every other
vertex
A clique graph is a graph where each connected component is a clique
The concept of clustering is closely related to clique graphs. Every p
HW 3
1. Motif Finding Problem: Given a list of t sequences each of length n, find the
best pattern of length l that appears in each of the t sequences.
2. A New Motif Finding Approach
3. Motif Finding Problem: Given a list of t sequences each of length n,
HW 2
1. Select Analysis
2. Select seems risky compared to sort
3. To improve Select, we need to choose m
to give good splits
4. It can be proven that to achieve O(n) running time, we dont need a
perfect splits, just reasonably good ones.
5. In fact, if bo
HW 1
1. Randomized algorithms incorporate random, rather than deterministic, decisions
2. Commonly used in situations where no exact and/or fast algorithm is known
3. Main advantage is that no input can reliably produce worst-case results because
the algo
Study Aid 3
Hierarchical Clustering: Recomputing Distances
dmin(C, C*) = min d(x,y)
for all elements x in C and y in C*
Distance between two clusters is the smallest distance between any pair of
their elements
davg(C, C*) = (1 / |C*|C|) d(x,y)
for all e