Unformatted text preview: report the matches to that sublibrary
when done).
when BLAST Statistical significance
s A key to the utility of BLAST is the ability
key
to calculate expected probabilities of
occurrence of Maximum Segment Pairs
(MSPs) given w and T
(MSPs)
s This allows BLAST to rank matching
This
sequences in order of “significance” and to
cut off listings at a userspecified
probability
probability BLAST Statistical significance
s From KarlinAltschul formulation, the
From
expected value (mean) of the HSPs between
a query and a set of random sequences is u ≅ [ e (Kmn)]/λ
log
or
u ≅ [ Kmn)]/λ
ln( BLAST Statistical significance
s BLAST uses a correction to this formulation
BLAST
that takes into account the effective
sequence lengths of the query and the
sequence
database sequences
database u [( m)/
=n ′ ′]
l Kn λ BLAST Statistical significance
s The corrected lengths are given by m′ = m − (lnKmn) / H
n′ = n − (lnKmn) / H
with
H = (lnKmn) / l s where l is the average length of the alignment that
where
can be achieved between random sequences of
length m and n BLAST Statistical significance
s Given u, we can calculate the probability p of
Given we
observing a score S between a query sequence and
a given database sequence that is equal to or
greater than x
greater −xu
λ−
() p ≥ = e (e
( x 1 x−
S ) −p ) BLAST Statistical significance
s s Lastly, we have to consider that we are searching
Lastly,
many database sequences and can expect even a
relatively rare score to occur with high chance
given enough comparisons
given
For a database of D sequences, this is
For − sx
p≥D
() E1e
≈
− Summary of Database Search
Methods
Authors (Program)
Description
Needleman & Wunsch full alignment
Wilbur & Lipman
match ktuple  form
diag  NW
Lipman & Pearson
ktuple  diag  rescore
(FASTP)
Pearson & Lipman
FASTP  join diags(FASTA)
NW
Altschul et al (BLAST) word match list statistics...
