HMM-Lec9-092104

HMM-Lec9-092104 - Lecture 9 Hidden Markov Models BioE 480...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Delete states (circle): silent or null state. Do not match any residues, they are there so it is possible to jump over one or more columns: For modeling when just a few of the sequences have a “-” at a position. Example:
Background image of page 2
Pseudo-counts Dangerous to estimate a probability distribution from just a few observed amino acids. If there are two sequences, with Leu at a position: P for Leu =1, but P = 0 for all other residues at this position But we know that often Val substitutes Leu. The probability of the whole sequence are easily become 0 if a single Leu is substituted by a Val. Or , the log-odds is minus infinity. How to avoid “over-fitting” (strong conclusions drawn from very little evidence)? Use pseudocounts: Pretend to have more counts than those from the data. A. Add 1 to all the counts: Leu: 3/23, other a.a.: 1/23
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Adding 1 to all counts is as assuming a priori all a.a. are equally likely. Another approach: use background composition as pseudocounts.
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Searching a database with HMM Know how to calculate the probability of a sequence in the alignment: multiplying all the probabilities (or adding the log-odds scores) in the model along the path followed by that sequence. For sequences not in the alignment, we do not know the path. Find a path through the model where the new sequence fits well: we can then score it as before. Need to “align” the sequence to the model: Assigning states to each residue in the sequence. A given sequence can have many alignments.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Eg. A protein has a.a. as: A1, A2, A3, … HMM states as: M1, M2, M3, … for match states, I1, I2, I3, … for insertion states, An alignment: A1 matches M1, A2 and A3 match I1, A4 matches M2, A5 matches M6 (after passing through three delete states). For each alignment, we can calculate the probability of the sequence,
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 21

HMM-Lec9-092104 - Lecture 9 Hidden Markov Models BioE 480...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online