Sequence profilesSequence patterns using regular expressions (such as PROSITE) have a problem with large multiple alignmentsof divergent families: As more sequences are added, the probability that there will be even a few constant or even strongly conserved sites will diminish. There will always be an exception to the rule. In order to avoid missing a known member of a family, the regexp has to be made more general, but then the danger of including garbage increases. This is the typical sensitivity-specificity problem.There is another approach. Sequence profiles(Gribskov et al 1987) are essentially patterns where each position in the sequence of the segment (or motif) has been assigned a probabilityvalue for each possible amino-acid residue type. Instead of requiring a yes/no response to the question "does the amino acid in the sequence fit the pattern?", we now get a response "it fits at a level of 0.9", or "it fits at level of 0.1". The idea is to make the process softer. Add together the soft responses to an overall sum and then make a decision. Don't make the decision at each comparison
This is the end of the preview.
access the rest of the document.