02doct08d2

02doct08d2 - Harvard-MIT Division of Health Sciences and...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Harvard-MIT Division of Health Sciences and Technology HST.508: Genomics and Computational Biology DNA1: Last week's take-home lessons Types of mutants Mutation, drift, selection Binomial for each Association studies χ 2 statistic Linked & causative alleles Alleles, Haplotypes, genotypes Computing the first genome, the second . .. New technologies Random and systematic errors 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
DNA2: Today's story and goals Motivation and connection to DNA1 Comparing types of alignments & algorithms Dynamic programming Multi-sequence alignment Space-time-accuracy tradeoffs Finding genes -- motif profiles Hidden Markov Model for CpG Islands 2
Background image of page 2
DNA 2 DNA1: the last 5000 generations Intro2: Common & simple Figure (http://216.190.101.28/GOLD/) 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Applications of Dynamic Programming a To sequence analysis Shotgun sequence assembly Multiple alignments Dispersed & tandem repeats Bird song alignments Gene Expression time-warping a Through HMMs RNA gene search & structure prediction Distant protein homologies Speech recognition 4
Background image of page 4
Alignments & Scores Global (e.g. haplotype) Local (motif) ACCACACA ACCACACA ::xx::x: :::: ACACCATA ACACCATA Score= 5(+1) + 3(-1) = 2 Score= 4(+1) = 4 Suffix (shotgun assembly) ACCACACA ::: ACACCATA Score= 3(+1) =3 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Increasingly complex (accurate) searches Exact (StringSearch) CGCG Regular expression (PrositeSearch) CGN{0-9}CG = CGAACG Substitution matrix (BlastN) CGCG ~= CACG Profile matrix (PSI-blast) CGc(g/a) ~ = CACG Gaps (Gap-Blast) CGCG ~= CGAACG Dynamic Programming (NW, SM) CGCG ~= CAGACG Hidden Markov Models (HMMER) WU ( http://hmmer.wustl.edu/) 6
Background image of page 6
"Hardness" of (multi-) sequence alignment Align 2 sequences of length N allowing gaps. ACCAC-ACA ACCACACA ::x::x:x: :xxxxxx: AC-ACCATA , A-----CACCATA , etc. 2N gap positions, gap lengths of 0 to N each: A naïve algorithm might scale by O(N 2N ). For N= 3x10 9 this is rather large. Now, what about k>2 sequences? or rearrangements other than gaps? 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon