An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE182 lecture 4 notes &questions Vineet Bafna October 5, 2006 1 Notes Recall that we are interested in computing local alignments of a query string of length n against a subsequence from database. Certainly, we can apply the smith waterman (local alignment) algorithm treating the entire database as a single string of length m , and computing the optimum local alignment. See Problem ?? . The number of steps, from earlier arguments is O ( nm ) . As a rough calculation, suppose,we were querying the entire human genome, against the entire mouse genome implying that n m 3 × 10 9 . An full-blown local alignment would require 10 19 steps. Even with a fast computation of 10 10 steps per sec., we would need 10 9 s ( 31 CPU-years) to do the computation. It is worth considering if we can do better. A general approach to this problem is through database filtering . Think of a database filter as a program that rapidly eliminates a large portion of the database without losing any of the similar strings. For example, suppose we had a filter that runs in time O ( m ) (independent of the query size), and rejects all but a fraction f << 1 of the database. Then, by aligning the query only to the filtered sequence , the total running time is reduced to O ( m + fmn ) . Suppose, we had a filter with f = 10 - 8 . then, the total running time for the previous query would have 10 9 + 10 - 8 10 19 10 11 steps. At 10 10 steps per second, we could do the query in 10 secs. This is the idea that is pursued in Blast. 2 Basics Let us start with the assumption that the database is a random string over the characters { A, C, G, T } , each occurring independently with probability 0 . 25 . Next, assume that the query is a string of k ones, given by q = 111 . . . 111 k We are interested in computing Pr ( q is contained in a database substring ) As it turns out, this is somewhat difficult to compute because of the dependencies between occurrence at different positions. However, given a fixed position i in the database, Pr ( q occurs at position i ) = 1 4 k Therefore, the expected number of occurrences of q = n ( 1 4 ) k . Why? 2.1 Basic probability To see this, define an indicator variable X i for all positions 1 i n .
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern