Regular expressions - Regular expressions A multiple...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Regular expressions A multiple alignment is very useful for a lot of different analysis, such as identifying patterns in the sequence that have some connection with the structure and/or the function of the protein domain. However, a large multiple alignment is difficult to handle, and we may want to condense the information in it down to the bare minimum. One way of doing that is to extract an explicit description of the pattern of conserved residues that we can identify in a multiple alignment. How can we represent a pattern of residues as found in a multiple alignment? And how can we use such a pattern to search for it in other protein sequences? Computer sciencists have devised a formalism to describe the kind of patterns we need: regular expressions (svenska: "reguljära uttryck"). Sometimes the term is abbreviated to regexp . Regular expressions can be used to describe languages of a particular, restricted kind. Ordinary human languages do not fall into this category; they are too complex. In our case, let us view the sequence of a protein (or DNA) as a sentence in a specific, small language. We can then define a particular regular expression (also called a grammar) that fits the given protein sequence. This regexp can then be used to test other sequences, to see whether they fit the pattern or not. It is essential to understand that
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 3

Regular expressions - Regular expressions A multiple...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online