{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

02doct08d2

02doct08d2 - Harvard-MIT Division of Health Sciences and...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Harvard-MIT Division of Health Sciences and Technology HST.508: Genomics and Computational Biology DNA1: Last week's take-home lessons Types of mutants Mutation, drift, selection Binomial for each Association studies χ 2 statistic Linked & causative alleles Alleles, Haplotypes, genotypes Computing the first genome, the second . .. New technologies Random and systematic errors 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
DNA2: Today's story and goals Motivation and connection to DNA1 Comparing types of alignments & algorithms Dynamic programming Multi-sequence alignment Space-time-accuracy tradeoffs Finding genes -- motif profiles Hidden Markov Model for CpG Islands 2
Background image of page 2
DNA 2 DNA1: the last 5000 generations Intro2: Common & simple Figure (http://216.190.101.28/GOLD/) 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Applications of Dynamic Programming a To sequence analysis Shotgun sequence assembly Multiple alignments Dispersed & tandem repeats Bird song alignments Gene Expression time-warping a Through HMMs RNA gene search & structure prediction Distant protein homologies Speech recognition 4
Background image of page 4
Alignments & Scores Global (e.g. haplotype) Local (motif) ACCACACA ACCACACA ::xx::x: :::: ACACCATA ACACCATA Score= 5(+1) + 3(-1) = 2 Score= 4(+1) = 4 Suffix (shotgun assembly) ACCACACA ::: ACACCATA Score= 3(+1) =3 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Increasingly complex (accurate) searches Exact (StringSearch) CGCG Regular expression (PrositeSearch) CGN{0-9}CG = CGAACG Substitution matrix (BlastN) CGCG ~= CACG Profile matrix (PSI-blast) CGc(g/a) ~ = CACG Gaps (Gap-Blast) CGCG ~= CGAACG Dynamic Programming (NW, SM) CGCG ~= CAGACG Hidden Markov Models (HMMER) WU ( http://hmmer.wustl.edu/) 6
Background image of page 6
"Hardness" of (multi-) sequence alignment Align 2 sequences of length N allowing gaps. ACCAC-ACA ACCACACA ::x::x:x: :xxxxxx: AC-ACCATA , A-----CACCATA , etc. 2N gap positions, gap lengths of 0 to N each: A naïve algorithm might scale by O(N 2N ). For N= 3x10 9 this is rather large. Now, what about k>2 sequences? or rearrangements other than gaps? 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon