Introduction to Bioinformatics Fall 2010 Lecture 3 Eleazar Eskin University of California, Los Angeles

Motif Finding (Chapter 4) Lecture 3. October 4th, 2010
Motif Finding Problem ( l , d )- k Problem: Given m sequences of length n , find all patterns of length l such that the pattern occurs in k sequences with up to d mismatches. We are interested in patterns of length 30+ with many mismatches.

Motif Finding Challenge Problem ( 15 , 4 )- 20 Problem: Given 20 sequences of length 600 , find all patterns of length 15 such that the pattern occurs in 20 sequences with up to 4 mismatches. Was presented as a “challenge problem” in 2000.
Variants of Motif Finding Monad Patterns Short contiguous strings. Instances occur with some mismatches. Composite Regulatory Signals Consist of multiple monad patterns. Occur near each other. (GuhaThakurta, Stormo 2001) Dyad Signals 2 monads that occur fixed distance apart. (van Helded et. al 2000) (Gelfand et. al 2000)

Sample Sequences atgaccgggatactgataccgtatttgcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgcc acccctattttttgagcagatttaggacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtac gagtatccctgggatgacttttggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccga ctgagaattggatgaccttgtaatgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga cccttttgcggtaatgtgccggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttatag tcaatcatgttcttgtgaatgatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgcaa ggttttggcccttgttagagcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttcat acttgagttggtttcgaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta tggcccattggctaaaagccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaag tggtgagcaacgacagatcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagca
Sample with Dyad AAAAAAAAGGGGGGG-(10..15)-CCCCCCCTTTTTTTT atgaccgggatactgat AAAAAAAAGGGGGGG ggcgtacacattag CCCCCCCCTTTTTTT acgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaa AAAAAAAGAGGGGGG aaacttttccgaata CCCCCCCCTTTTTTTg a tgagtatccctgggatgactt AAAAAAAAGGGGGGG tgctctcccgatttt CCCCCCCCTTTTTTT tctcgccagggtccga gctgagaattggatg AAAAAAAAGGGGGGG tccacgcaatcgcgaa CCCCCCCCTTTTTTT aggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggagg AAAAAAAGAGGGGGG agccctaacggacttaat CCCCCCCCTTTTTTT ttatcag gtcaatcatgttcttgtgaatggattt AAAAAAAAGGGGGGG gaccgcttggcgc CCCCCCCCTTTTTTT gggcgagcgcaa cggttttggcccttgttagaggcccccgt AAAAAAAAGGGGGGG caattatgagagag CCCCCCCCTTTTTTT gcggttcat aacttgagtt AAAAAAAAGGGGGGG ctggggcacatacaagag CCCCCCCCTTTTTTT agttaatgctgatgacactatgta ttggcccattggctaaaagcccaa AAAAAAAGAGGGGGG gatagaatccttgcat CCCCCCCCTTTTTTT accgaagggaag ctggtgagcaacgacagattcttacgtgctagct AAAAAAAGAGGGGGG tctaatatcgcacctt ACCCCCCCCTTTTTTT a

Sample with (15,4)-Dyad Signal AAAAAAAAGGGGGGG-(10..15)-CCCCCCCTTTTTTTT atgaccgggatactgat AAA t AAA c GGG a G c G ggcgtacacattag CC a CCC t CTT c TTT c acgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaa A c AAAAA cc GGG t GG aaacttttccgaata C t CC a CCCTTTT gg Tg a tgagtatccctgggatgactt AA t AAA
• Graph Theory, Pattern Subspace, AG C T AC AA AG C T T T AA AG C T, Pattern Graph Pruning

