Sequence profiles Sequence patterns using regular expressions (such as PROSITE) have a problem with large multiple alignments of divergent families: As more sequences are added, the probability that there will be even a few constant or even strongly conserved sites will diminish. There will always be an exception to the rule . In order to avoid missing a known member of a family, the regexp has to be made more general, but then the danger of including garbage increases. This is the typical sensitivity-specificity problem. There is another approach. Sequence profiles (Gribskov et al 1987 ) are essentially patterns where each position in the sequence of the segment (or motif) has been assigned a probability value for each possible amino-acid residue type. Instead of requiring a yes/no response to the question "does the amino acid in the sequence fit the pattern?", we now get a response "it fits at a level of 0.9", or "it fits at level of 0.1". The idea is to make the process softer. Add together the soft responses to an
This is the end of the preview. Sign up
access the rest of the document.