UC Berkeley, CS 176: Algorithms for Computational Biology (Fall 2016)
Problem Set 3
Instructor: Nir Yosef
1. (Bowtie - 70 pts)
In this section, you will implement in Python a simplified version of the Bowtie heuristic for inexact matching of short
pattern

CS 176: Algorithms for Computational Biology
Discussion 3: Sep. 16 2016
1. (The KS Algorithm) Consider the string S = cabcccacccc. Let Si be the suffix starting at position i. We will construct
a suffix array in linear time. We will start our index at 1 s

CS 176: Algorithms for Computational Biology
Discussion 3: September 16th - 17th,2015
1. (BWT) B = nkknak$reia, the Burrows-Wheeler transformation of a string S. What is S?
Solution:We can solve this with either forward or backward decoding. Well use back

CS 176: Algorithms for Computational Biology 2016
1. (Problem 1 - from Dasgupta, Papadimitriou, and Vazirani ) Consider the graph below, presented in its linearized
form on the right.
Write a dynamic programming algorithm to determine the minimum length p

CS 176: Algorithms for Computational Biology 2016
1. Counting the possible number of global alignments
Let X, Y be strings of lengths n and m, respectively. How many different global alignments between them are possible?
Solution: Consider first the lengt

2016 Fall CS 176 Review Questions
1. Assume the usual lexicographical order discussed in class and let S = ellennelle$.
(a) Draw a suffix tree for S.
(b) Find the suffix array AS for S.
(c) Find the Burrows-Wheeler transform of S.
(d) Find the inverse BWT

CS 176: Algorithms for Computational Biology
Problem Set 2 Solutions: 2016
1. (FM-index and Exact Pattern Matching - 15 pts) Given a string S of size |S| = n, let sorted (S) denote the matrix
containing the sorted cyclic permutations of S as rows, and let

CS 176: Algorithms for Computational Biology
Discussion 3: September 16th - 17th,2015
1. (BWT) B = nkknak$reia, the Burrows-Wheeler transformation of a string S. What is S?
2. (LCP Arrays) The longest common prefix array (LCPA) is an important supplementa

T = aatataa
P = ataa
S = ataa$aatataa
Letter
t
k
2
a
3
a
4
$
5
a
6
a
7
t
8
a
9
t
10
Case
Start. Initialize k=2 ,l=0 ,r=0. Perform direct char
comparison starting at k=2
Not starting in Z box. Do direct char comparison.
Find 1 match. Update l,r.
Not starti

CS 176: Algorithms for Computational Biology
Discussion 3: Sep. 16 2016
1. (The KS Algorithm) Consider the string S = cabcccacccc. Let Si be the suffix starting at position i. We will construct
a suffix array in linear time. We will start our index at 1 s

2016 Fall CS 176 Review Questions
1. Assume the usual lexicographical order discussed in class and let S = ellennelle$.
(a) Draw a suffix tree for S.
(This image was generated using the VisualAlgo tool at http:/visualgo.net/suffixtree.html. Note that
they

CS 176: Algorithms for Computational Biology
Discussion 3: Sep. 16 2016
1. (k-mer frequency counting)
A k-mer of a string S is a substring of S with length k. For example, the string abcd has two 3-mers, namely abc and bcd.
Let |S| = n.
Biological motivat

CS 176: Algorithms for Computational Biology
Discussion 5: 2016
1. (Problem 1 - from Dasgupta, Papadimitriou, and Vazirani ) Consider the graph below, presented in its linearized
form on the right.
Write a dynamic programming algorithm to determine the mi

CS 176: Algorithms for Computational Biology
Problem Set 1 Solutions: 2016
1. (Suffix Arrays)
(a) (Circular string linearization) Given a string S, devise an efficient SA-based algorithm to find the lexicographicallysmallest rotation of S.
Solution: Build

CS 176: Algorithms for Computational Biology
Discussion 1: 2015
1. (Z-algorithm intuition) Let S be an arbitrary string. We computed its z-values and found that z2 = q > 0. What is z3
(Formulate your answer with the characters S[1], S[2], . . .)? Can you

CS 176: Algorithms for Computational Biology
Discussion 2: September 9th, 2016
1. (Maximal Unique Matches) Consider two strings S and S 0 over some finite alphabet , and let ` denote a positive
integer. A MUM (which stands for maximal unique matches) is a

CS 176: Algorithms for Computational Biology
Problem Set 2: 2016
1. (FM-index and Exact Pattern Matching - 15 pts) Given a string S of size |S| = n, let sorted (S) denote the matrix
containing the sorted cyclic permutations of S as rows, and let L denote

CS 176: Algorithms for Computational Biology
Discussion 1: 2016
1. (Z-algorithm intuition) Let S be an arbitrary string. We computed its z-values and found that z2 = q > 0. What is z3
(Formulate your answer with the characters S[1], S[2], . . .)? Can you

CS 176: Algorithms for Computational Biology
Problem Set 1: 2016
Due: Sep 21st 23:59
1. (Suffix Arrays)
(a) (Circular string linearization) Given a string S, devise an efficient SA-based algorithm to find the lexicographicallysmallest rotation of S.
(b) (