lect2 Blast & variants I

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: FA05 CSE182 CSE 182-L2:Blast & variants I Dynamic Programming www.cse.ucsd.edu/classes/fa05/cse182 www.cse.ucsd.edu/classes/fa05/cse182 www.cse.ucsd.edu/~vbafna www.cse.ucsd.edu/~vbafna FA05 CSE182 Searching Sequence databases http://www.ncbi.nlm.nih.gov/BLAST/ FA05 CSE182 Query: >gi|26339572|dbj|BAC33457.1| unnamed protein product [Mus musculus] MSSTKLEDSLSRRNWSSASELNETQEPFLNPTDYDDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVA LIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTV SVSVSVLTLSCIALDRWYAICHPLMFKSTAKRARNSIVVIWIVSCIIMIPQAIVMECSSMLPGLANKT TLFTVCDEHWGGEVYPKMYHICFFLVTYMAPLCLMILAYLQIFRKLWCRQIPGTSSVVQRKWKQQQPV SQPRGSGQQSKARISAVAAEIKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFTHTEDRE TVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSCCLGVHHRQGDRLARGRTSTESRKSLTT QISNFDNVSKLSEHVVLTSISTLPAANGAGPLQNWYLQQGVPSSLLSTWLEV • What is the function of this sequence? • Is there a human homolog? • Which organelle does it work in? (Secreted/membrane bound) • Idea: Search a database of known proteins to see if you can find similar sequences which have a known function FA05 CSE182 Querying with Blast FA05 CSE182 Blast Output • The output (Blastp query) is a series of protein sequences, ranked according to similarity with the query • Each database hit is aligned to a subsequence of the query FA05 CSE182 Blast Output query 26 19 405 422 Schematic db Q beg S beg Q end S end S Id FA05 CSE182 Blast Output Q beg S beg Q end S end S Id FA05 CSE182 The technological question • How do we measure similarity between sequences? • Percent identity? – Hard to compute without indels. Number of sequence edit operations? Number of sequence edit operations? Implies a notion of alignment. Implies a notion of alignment. A T C A A C G T C A A T G G T A T C A A - C G -- T C A A T G G T FA05 CSE182 The biology question • How do we interpret these results? – Similar sequence in the 3 species implies that the common ancestor of the 3 had that sequence. – The sequence accumulates mutations over time. These mutations may be indels, or substitutions. – Hum and mus diverged more recently and so the sequences are more likely to be similar. hum mus dros hummus? FA05 CSE182 Computing alignments • What is an alignment? • 2Xm table. • Each sequence is a row, with interspersed gaps • Columns describe the edit operations • What is the score of an alignment?...
View Full Document

This note was uploaded on 02/14/2008 for the course CSE 182 taught by Professor Bafna during the Fall '06 term at UCSD.

Page1 / 42

lect2 Blast & variants I - FA05 CSE182 CSE 182-L2:Blast...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online