Problem Set #2
Issued: 9/18/2011
Due: In class 10/6/2010
Problem Set #1
Issued: 9/4/2011
Due: In class 9/22/2010
If the distance matrix D is NOT additive, then we look for a tree T
that approximates D the best:
Squared Error : i,j (dij(T) Dij)2
Thus far
distance-based evolutionary trees
Additive guarantees that the tree would reproduce all
pairwise distances, but not all distance matrices are additive
Comp 555: BioAlgorithms - Fall 2013
Problem Set #1
Issued: 9/3/2013 Due: In class 9/26/2013
SOLUTIONS
The Burrows-Wheeler
Transform and
Bioinformatics
J. Matthew Holt
[email protected]
Last Class - Multiple Pattern
Matching Problem
m - length of text
d - max length of pattern
x - number of patterns
Programming Problem. Modify BreakpointReversalSort.py as follows:
The given version of the code outputs only one of many possible solutions. The way to generate
Chapter 7 - Pattern
Matching
J. Matthew Holt
[email protected]
Sequence Alignment
Sequencing data
Millions to billions of reads
Typically 100+ basepairs
Edit Distances
Longest Common Subsequence
Global Sequence Alignment
Scoring Matrices
Local Sequence Alignment
Alignment with Affine Gap Penalties
Edit Distances
Longest Common Subsequence
Global Sequence Alignment
Scoring Matrices
Local Sequence Alignment
Alignment with Affine Gap Penalties
So far weve tried: A greedy algorithm that does not
work for all inputs (it is incorrect)
New tricks weve learned
Is there an exhaustive search algorithm?
Recall DNA is the essential
information determining the
function of living organisms
In order to understand the
biological machinery wed
like to read the code of
An iterative algorithm where at each step
Take what seems to be the best option
Cons:
It may return incorrect results
It may require more steps than necessary
Pros:
We developed a SimpleReversalSort algorithm that sorts
by extending its prefix on every iteration (n-1) steps.
As a precursor to transcription (the reading of
DNA to construct RNAs that eventually leading
to protein synthesis) special proteins bind to the
DNA, separate it to enable
An algorithm is a sequence of instructions that
one must perform in order to solve a wellformulated problem.
input
Problem: Complexity
problem
algorithm
Comp 555: Bioalgorithms
Suitable for undergraduate and graduate students
CS majors who want to learn bioinformatics
Non CS majors from the statistical of biological
(from Lecture 2)
Restriction enzymes break DNA whenever they
encounter specific base sequences
They occur reasonably frequently within long
sequences (a 6-base sequence