Feb21_Alignment - Integrative Biology 200A "PRINCIPLES...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Integrative Biology 200A “PRINCIPLES OF PHYLOGENETICS” Spring 2008 University of California, Berkeley Will- 21 Feb 2008 Alignment Two or more sequences (bases, amino acids, proteins, etc.) are matched in a pairwise alignment either globally (two sequences matched over their whole length) or locally (some subset of the sequences matched while other regions are not expected to match). Sequence similarity can simply be a mathematical distance between two. Establishing an initial estimate of homology (basically similarity) is essential. Unaligned sequence data has no a priori base homology. As a consequence, the fixed alignment, achieved by one method or another, is treated as prior, or background knowledge. Recall the hierarchy of characters and state and that only the states are really tested in the analyses. The outcome phylogenetic analyses are often strongly influenced by the alignment. BLAST (Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol 215(3):403-10, 1990). For example, a gene is newly identified and function understood in Drosophila , a researcher can BLAST the database of the human genome to look for similar gene sequences. Very basic description of BLAST 1. Uses short segments (“words”) of sequence to find other sequences that contain the same set. 2. Does “ungapped” alignment extending from the matched subsequence regions to find high-scoring matches 3. Does a rapid gapped alignment to select and rank close matches Practical issues : For two sequences, i.e. pairwise alignment, of length n, if no gaps are allowed then there is one or few optimal alignment(s). If gaps are allowed, i.e. there is sequence length variation, then. .. (2n)!/(n!) 2 e.g. n=50 then 10 29 alignments. Enumeration is not an option! We need heuristic searches based on Optimality and scoring. Two problems- how to find alignments and how to choose. Alignment really attempts to balance the amount of indels with the amount of base substitution, normally based on some cost differential. Of course it is possible to account for all differences by inserting enough gaps (trivial alignment). In the simplest model this is the “Edit distance” or the minimal number of events required to transform one sequence into another using some scheme of insertions, deletions and substitutions. Go from acctga to agcta:
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/01/2008 for the course IB 200 taught by Professor Lindberg,mishler,will during the Spring '08 term at University of California, Berkeley.

Page1 / 5

Feb21_Alignment - Integrative Biology 200A "PRINCIPLES...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online