(from Lecture 2)
Restriction enzymes break DNA whenever they
encounter specific base sequences
They occur reasonably frequently within long
sequences (a 6-base sequence target appears, on
average, 1:4096 bases)
Can be used
Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly.
date, so plan accordingly. No late homework will a
If the distance matrix D is NOT additive, then we look for a tree T
that approximates D the best:
Squared Error : i,j (dij(T) Dij)2
Squared Error is a measure of the quality of the fit between
distance matrix and the tree:
Thus far
distance-based evolutionary trees
Additive guarantees that the tree would reproduce all
pairwise distances, but not all distance matrices are additive
Sequences ! Distances ! Sequences
character-based evolutiona
SOLUTIONS
Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly.
date, so plan accordingly. No late homework will acc
The Burrows-Wheeler
Transform and
Bioinformatics
J. Matthew Holt
Last Class - Multiple Pattern
Matching Problem
m - length of text
d - max length of pattern
x - number of patterns
Method
Storage Cost
Single Pattern
Search Time
Multiple
Programming Problem. Modify BreakpointReversalSort.py as follows:
The given version of the code outputs only one of many possible solutions. The way to generate
multiple solutions should be that if at any stage of the program, there is more than one rever
Chapter 7 - Pattern
Matching
J. Matthew Holt
Sequence Alignment
Sequencing data
Millions to billions of reads
Typically 100+ basepairs
Reference genome - millions to billions of basepairs
Where does a read best match the reference
gen
Edit Distances
Longest Common Subsequence
Global Sequence Alignment
Scoring Matrices
Local Sequence Alignment
Alignment with Affine Gap Penalties
Multiple Alignment problem
The Global Alignment Pr
Edit Distances
Longest Common Subsequence
Global Sequence Alignment
Scoring Matrices
Local Sequence Alignment
Alignment with Affine Gap Penalties
Dynamic Programming is a technique for
computing r
So far weve tried: A greedy algorithm that does not
work for all inputs (it is incorrect)
New tricks weve learned
Is there an exhaustive search algorithm?
def exhaustiveChange(amount, denominations):
bestN = 100
count = [0
Recall DNA is the essential
information determining the
function of living organisms
In order to understand the
biological machinery wed
like to read the code of
the genome
PHILIPPE PSAILA / SCIENCE PHOTO LIBRARY
How can we
An iterative algorithm where at each step
Take what seems to be the best option
Cons:
It may return incorrect results
It may require more steps than necessary
Pros:
Coin change problem
it often takes very little time to
As a precursor to transcription (the reading of
DNA to construct RNAs that eventually leading
to protein synthesis) special proteins bind to the
DNA, separate it to enable its reading.
How do these proteins know where the cod
An algorithm is a sequence of instructions that
one must perform in order to solve a wellformulated problem.
input
Problem: Complexity
problem
algorithm
Algorithm: Correctness
Complexity
output
Comp 555: Bioalgorithms
Suitable for undergraduate and graduate students
CS majors who want to learn bioinformatics
Non CS majors from the statistical of biological
sciences who are interested in the algorithms used in
bio
Homework Information: Some of the problems are probably too long to attempt the night before the due date, so plan accordingly.
date, so plan accordingly. No late homework will