Alignment Statistics and Substitution Matrices BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven Fall 2011 Probabilistic model of alignments we’ll focus on protein alignments without gaps given an alignment, we can consider two possibilities R : the sequences are related by evolution U : the sequences are unrelated How can we distinguish these possibilities? How is this view related to amino-acid substitution matrices?

Model for unrelated sequences we’ll assume that each position in the alignment is sampled randomly from some distribution of amino acids let be the probability of amino acid a the probability of an n -character alignment of x and y is given by 1 1 Pr( , | ) i i n n x y i i x y U q q = = = ! ! a q Model for related sequences we’ll assume that each pair of aligned amino acids evolved from a common ancestor let be the probability that evolution gave rise to amino acid a in one sequence and b in another sequence the probability of an alignment of x and y is given by ab p 1 Pr( , | ) i i n x y i x y R p = = !
Probabilistic model of alignments How can we decide which possibility ( U or R ) is more likely?

