8/29/13
Comp 555
Fall 2013
1
(from Lecture 2)
Restriction enzymes break DNA whenever they
encounter specific base sequences
They occur reasonably frequently within long
sequences (a 6-base sequence target appears, on
average, 1:4096 bases)
Can be used
Comp 590-087/790-087: BioAlgorithms - Fall 2011
Problem Set #1
Issued: 9/4/2011
Due: In class 9/22/2010
Homework Information: Some of the problems are probably too long to attempt the night before the due
date, so plan accordingly. No late homework will a
4/6/16
Comp 555
Spring 2016
1
If the distance matrix D is NOT additive, then we look for a tree T
that approximates D the best:
Squared Error : i,j (dij(T) Dij)2
Squared Error is a measure of the quality of the fit between
distance matrix and the tree:
4/6/16
Comp 555
Spring 2016
1
Thus far
distance-based evolutionary trees
Additive guarantees that the tree would reproduce all
pairwise distances, but not all distance matrices are additive
Sequences ! Distances ! Sequences
character-based evolutiona
Comp 555: BioAlgorithms - Fall 2013
Problem Set #1
Issued: 9/3/2013 Due: In class 9/26/2013
SOLUTIONS
Homework Information: Some of the problems are probably too long to attempt the night before the due
date, so plan accordingly. No late homework will acc
The Burrows-Wheeler
Transform and
Bioinformatics
J. Matthew Holt
[email protected]
Last Class - Multiple Pattern
Matching Problem
m - length of text
d - max length of pattern
x - number of patterns
Method
Storage Cost
Single Pattern
Search Time
Multiple
Programming Problem. Modify BreakpointReversalSort.py as follows:
The given version of the code outputs only one of many possible solutions. The way to generate
multiple solutions should be that if at any stage of the program, there is more than one rever
Chapter 7 - Pattern
Matching
J. Matthew Holt
[email protected]
Sequence Alignment
Sequencing data
Millions to billions of reads
Typically 100+ basepairs
Reference genome - millions to billions of basepairs
Where does a read best match the reference
gen
9/26/13
Comp 555
Fall 2013
1
Edit Distances
Longest Common Subsequence
Global Sequence Alignment
Scoring Matrices
Local Sequence Alignment
Alignment with Affine Gap Penalties
Multiple Alignment problem
9/26/13
Comp 555
Fall 2013
2
The Global Alignment Pr
9/17/13
Comp 555
Fall 2013
1
Edit Distances
Longest Common Subsequence
Global Sequence Alignment
Scoring Matrices
Local Sequence Alignment
Alignment with Affine Gap Penalties
9/17/13
Comp 555
Fall 2013
2
Dynamic Programming is a technique for
computing r
9/16/13
Comp 555
Fall 2013
1
So far weve tried: A greedy algorithm that does not
work for all inputs (it is incorrect)
New tricks weve learned
Is there an exhaustive search algorithm?
def exhaustiveChange(amount, denominations):
bestN = 100
count = [0
8/22/13
Comp 555
Fall 2013
1
Recall DNA is the essential
information determining the
function of living organisms
In order to understand the
biological machinery wed
like to read the code of
the genome
PHILIPPE PSAILA / SCIENCE PHOTO LIBRARY
How can we
9/10/13
Comp 555
Fall 2013
1
An iterative algorithm where at each step
Take what seems to be the best option
Cons:
It may return incorrect results
It may require more steps than necessary
Pros:
Coin change problem
it often takes very little time to
9/2/13
Comp 555
Fall 2013
1
As a precursor to transcription (the reading of
DNA to construct RNAs that eventually leading
to protein synthesis) special proteins bind to the
DNA, separate it to enable its reading.
How do these proteins know where the cod
8/29/13
Comp 555
Fall 2013
1
An algorithm is a sequence of instructions that
one must perform in order to solve a wellformulated problem.
input
Problem: Complexity
problem
algorithm
Algorithm: Correctness
Complexity
output
8/29/13
Comp 555
Fall 2013
2
Al
8/22/13
Comp 555
Fall 2013
1
Comp 555: Bioalgorithms
Suitable for undergraduate and graduate students
CS majors who want to learn bioinformatics
Non CS majors from the statistical of biological
sciences who are interested in the algorithms used in
bio
Comp 590-087/790-087: BioAlgorithms - Fall 2011
Problem Set #2
Issued: 9/18/2011
Due: In class 10/6/2010
Homework Information: Some of the problems are probably too long to attempt the night before the due
date, so plan accordingly. No late homework will