lecture 4 notes &questions

# An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

This preview shows pages 1–2. Sign up to view the full content.

CSE182 lecture 4 notes &questions Vineet Bafna October 5, 2006 1 Notes Recall that we are interested in computing local alignments of a query string of length n against a subsequence from database. Certainly, we can apply the smith waterman (local alignment) algorithm treating the entire database as a single string of length m , and computing the optimum local alignment. See Problem ?? . The number of steps, from earlier arguments is O ( nm ) . As a rough calculation, suppose,we were querying the entire human genome, against the entire mouse genome implying that n m 3 × 10 9 . An full-blown local alignment would require 10 19 steps. Even with a fast computation of 10 10 steps per sec., we would need 10 9 s ( 31 CPU-years) to do the computation. It is worth considering if we can do better. A general approach to this problem is through database filtering . Think of a database filter as a program that rapidly eliminates a large portion of the database without losing any of the similar strings. For example, suppose we had a filter that runs in time O ( m ) (independent of the query size), and rejects all but a fraction f << 1 of the database. Then, by aligning the query only to the filtered sequence , the total running time is reduced to O ( m + fmn ) . Suppose, we had a filter with f = 10 - 8 . then, the total running time for the previous query would have 10 9 + 10 - 8 10 19 10 11 steps. At 10 10 steps per second, we could do the query in 10 secs. This is the idea that is pursued in Blast. 2 Basics Let us start with the assumption that the database is a random string over the characters { A, C, G, T } , each occurring independently with probability 0 . 25 . Next, assume that the query is a string of k ones, given by q = 111 . . . 111 k We are interested in computing Pr ( q is contained in a database substring ) As it turns out, this is somewhat difficult to compute because of the dependencies between occurrence at different positions. However, given a fixed position i in the database, Pr ( q occurs at position i ) = 1 4 k Therefore, the expected number of occurrences of q = n ( 1 4 ) k . Why? 2.1 Basic probability To see this, define an indicator variable X i for all positions 1 i n .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern