lect 08 motif finding

lect 08 motif finding - Bioc 2808 Lecture 8 Motif finding...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Click to edit Master subtitle style 7/29/10 Bioc 2808 Lecture 8 Motif finding Instructor Dr. Junwen Wang
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
7/29/10 Motif Common sequence “pattern” in the binding sites of a transcription factor A succinct way of capturing variability among the binding sites
Background image of page 2
7/29/10 Motif representation Consensus string (IUPAC ) May allow “degenerate” symbols in string, e.g., N = A/C/G/T; W = A/T; S = C/G; R = A/G; Y = T/C etc. Position weight matrix More powerful representation Probabilistic treatment
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
7/29/10 The motif finding problem Suppose a transcription factor (TF) controls five different genes Each of the five genes should have binding sites for TF in their promoter region Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Binding sites for TF
Background image of page 4
7/29/10 Binding sites from a weight matrix motif For known TF Given sequence S (e.g., 1000 base-pairs long) For each substring s of S, Compute Pr(s|W) If Pr(s|W) > some threshold, call that a binding site Look at S, as well as its “reverse
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
7/29/10 Ab initio motif finding The original motif finding problem To find a motif that represents binding sites of an unknown TF
Background image of page 6
7/29/10 Ab initio motif finding Define a motif score, find the motif with maximum score over all possible motifs in search space (motif model) Consensus string model => exhaustive search algorithm, guarantee on finding the optimal motif PWM model => local search, not
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
7/29/10 Ab initio motif finding - A precise motif model defines the search space (I.e., a list of all candidate motifs). The motif model also prescribes exactly how to determine if a substring is a match to a particular motif. Define motif model precisely
Background image of page 8
7/29/10 Ab initio motif finding - E.g., string over alphabet {A,C,G,T} of fixed length l. If l = 4, all 256 strings AAAA, AAAT, AAAC, …, TTTT, are “candidate motifs”. E.g., string over alphabet {A,C,G,T} of fixed length l , and allowing up to d mismatches. If AAAA is a motif, and d=1, then AAAT, AATA etc. are also counted as matches to motif. E.g., string over extended alphabet {A,C,G,T,N} of fixed length l . Here “N” stands for any
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Ab initio motif finding - Define a motif score, i.e., a real number associated with each candidate motif, in relation to the input sequences. E.g., count Ns of a motif s in input sequences(s). E.g., some function of the motif count Ns.
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 49

lect 08 motif finding - Bioc 2808 Lecture 8 Motif finding...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online