1 1 Chapter 3 Statistical Sequence Analysis Stats M254 Statistical Methods in Computational Biology 2 Outline of this chapter 3.1 Motif discovery PWM and Generalizations, Biophysical motif model, Gibbs motif sampler, EM, sequence segmentation; 3.2 Cis-regulatory modules Hierarchical mixture modeling, HMM for modules; 3.3 Sequence alignment Pairwise alignment, HMM for multiple alignment; 3.4 Motif finding in multiple species Motif model on phylogentic tree, phylogenetic motif finding, coupled HMM;

2 3 3.1 Motif discovery The problem of motif finding: 4 1. Motif model 1) Position-specific Weight Matrix (PWM) Motif (PWM) Estimate Modeling
3 5 • Estimation of PWM given binding sites: a) Calculate the count matrix: b) Two ways to estimate: Maximum likelihood estimate (MLE); Bayesian estimate (posterior mean). • Predict binding sites given PWM: Posterior odds calculation: Motif model versus background model. • Compared with

