# When blast statistical significance s a key to the

when BLAST Statistical significance s A key to the utility of BLAST is the ability key to calculate expected probabilities of occurrence of Maximum Segment Pairs (MSPs) given w and T (MSPs) s This allows BLAST to rank matching This sequences in order of "significance" and to cut off listings at a user-specified probability probability BLAST Statistical significance s From Karlin-Altschul formulation, the From expected value (mean) of the HSPs between a query and a set of random sequences is u ≅ [ e (Kmn)]/λ log or u ≅ [ Kmn)]/λ ln( BLAST Statistical significance s BLAST uses a correction to this formulation BLAST that takes into account the effective sequence lengths of the query and the sequence database sequences database u [( m)/ =n ′ ′] l Kn λ BLAST Statistical significance s The corrected lengths are given by m′ = m − (lnKmn) / H n′ = n − (lnKmn) / H with H = (lnKmn) / l s where l is the average length of the alignment that where can be achieved between random sequences of length m and n BLAST Statistical significance s Given u, we can calculate the probability p of Given we observing a score S between a query sequence and a given database sequence that is equal to or greater than x greater −xu λ− () p ≥ = e (e ( x 1 x− S ) −p ) BLAST Statistical significance s s Lastly, we have to consider that we are searching Lastly, many database sequences and can expect even a relatively rare score to occur with high chance given enough comparisons given For a database of D sequences, this is For − sx p≥D () E1e ≈ − Summary of Database Search Methods Authors (Program) Description Needleman & Wunsch full alignment Wilbur & Lipman match k-tuple - form diag - NW Lipman & Pearson k-tuple - diag - rescore (FASTP) Pearson & Lipman FASTP - join diags(FASTA) NW Altschul et al (BLAST) word match list statistics
