lecture_11 - Sequence similarity DNA: From a computer...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Sequence similarity DNA: From a computer scientist’s viewpoint, DNA is a sequence of characters chosen from the alphabet {A, C, G, T}. Human genome contains ~3 billion characters. An A4 paper contains 5 to 10 K characters, you need ~.5 million sheets of paper. Given two DNA, biologists want to know how similar they are. From the computational point of view, we first find the best way to align (pair-up) two sequences, then we can see how close they are. Alignment and similarity s An alignment : pairing up two strings character by character possibly with space inserted. s Example: ACCAATCC and AGCCATGC A C C A AT C C A _ CCA A T C C A G C C AT G C A G CCA _ T G C b 1st alignment: 5 positions matched; 3 mismatched 2 nd alignment: 6 positions matched; 1 mismatched b Which is the better alignment? What is the best alignment? Similarity function s A similarity (scoring) function δ specifies how much each match/mismatch/space contributes to the overall similarity. s E.g., match: 2; mismatch: -1; character-space: -1. _ A C G T _-1-1-1-1 ,G) = s Given an alignment, define its quality = sum of similarity score of each position. s A C C A AT C C A _ CCA A T C C A G C C AT G C A G CCA _ T G C score : 10 – 3 = 7 score : 12 – 3 = 9. A-1 2-1-1-1 C-1-1 2-1-1 G-1-1-1 2-1 T-1-1-1-1 2 δ (C,G) = -1 Similarity function s A more complicated similarity function. _ A C G T _-.5-.5 A-.5 2 0.5-1-1 C-.5 .5 4-1-1 G-1-1 3-1 T-1-1-1 2 The alignment problem s Similarity score: match: 2; mismatch: -1, char-space: -1 The similarity score...
View Full Document

This note was uploaded on 03/01/2010 for the course CS 1234 taught by Professor Chan during the Spring '10 term at University of the Bío-Bío.

Page1 / 20

lecture_11 - Sequence similarity DNA: From a computer...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online