Lecture 9. Thursday, September 28. Similarity Scores. Introduction to Energy

Lecture 9. Thursday, September 28. Similarity Scores. Introduction to Energy

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY John Kuriyan: University of California, Berkeley Chem 130/MCB 100A, Fall 2006, Lecture 9 Amino acid substitution score: Sij = 2 log2 Lij Here i and j are two amino acids, e.g. L and F. Lij is the likelihood, based on observed statistics in aligned sequences, of seeing the i and j residues together in the same column of a sequence alignment. Lij = frequency of seeing the i, j pair in an aligned sequence block expected frequency of seeing them together based on random chance Lij is symmetric. Suppose I=L and j=W LLL = LLW, but LWW ≠ LLL That is, Trp is more conserved than leucine but that difference shows up in the diagonal, not the off-diagonal element of the likelihood matrix. 1 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY 1.The calculation of amino acid substitution frequencies from blocks of aligned sequences COLUMNS THAT HAVE BOTH F & L nL = 18 nF = 26 : total number of Fs and Ls in the block If we shuffle all the sequences completely (i.e., randomize the sequence block), then the probability, pi, of finding the ith kind of amino acid at any position is given by: ni NxM For a random distributions, the joint probability that we find the ith type of amino acid at 1 position, and the jth type of amino acid at a 2nd position is given by: pi = pij = 2 x pi p j (i ≠ j) 2 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY The factor of 2 arises because we don’t consider the order: F-L pairing is counted as an equivalent occurance to L-F pairings. That is: pij is the chance of finding an i-j pairing, and we consider i-j and j-i to be equivalent. For the sequence block in the example, pL = 18 = 0.09375 8 x 24 25 = 0.1354 8 x 24 pF = Thus the probability of finding an F-L pairing in the randomized alignment is: PFL = 2 x 0.09375 x 0.1354 = 0.0254 Now we count the actual number of F-L pairings in the sequence block. There are only 3 columns in which F and L are both present. For the first column, the number of pairs of F and L is 6 x 2 = 12: 3 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY The total number of F-L pairings is 12 + 2 + 12 = 26. Now we count the total number of pairings in the sequence block, without regard to amino acid composition. Within each column the number of pairings is: N(N-1) 8 x 7 56 = = = 28 2 2 2 4 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY There are 24 columns, so we multiply this by 24 to get: 28 x 28 = 672 Therefore the observed frequency of F-L pairings in the sequence block is: fFL = 26 = 0.0387 672 This frequency (observed) is greater than that expected from random chance (pFL = 0.0254). The ratio of the observed frequency (fFL) to the randome chance (pFL) is known as the likelihood: LFL = fFL = 1.52 pFL The substitution score is defined as follows: SFL = 2 log2 LFL = 1.208 The actual block substitution matrix is generated from very large numbers of sequences for large numbers of proteins, so the resulting value of SFL is different. 2.Detecting fold similarity The experimental determination of protein structure is very slow, while genomic sequencing is very fast. Perhaps we now know most of the major protein folds – how can we 5 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY relate (link) a new sequence to a known fold? If we could do that we could lean something regarding function. There are two ways of doing this: 6 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY Fold recognition methods test the compatibility of the new sequence with known protein folds. There are many ways to do this. We shall study one just as an example. 3D-1D profile method (David Eisenberg) Step 1: Convert the three-dimensional structures of all known protein folds into a one-dimensional list of environmental descriptors. Hence: 3D → 1D profile. Each residue in a folded protein structure is in a different chemical environment. These environments are infinitely variable, but we can characterize them sensibly into a small number of classes. One parameter is the exposure of residues to water 7 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY Within these classes, the environment can be polar or nonpolar (hydrophobic) 8 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY 9 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY The environment of any residue in a three-dimensional structure can be mapped to a diagram such as this one. In addition, the secondary structure at each residue position is also an important factor because the different amino acids have different preferences for being in α helices, β strands and coil (neither). The database of known protein structures is then screened for the statistical preferences of each amino acid in each environmental class. The result is a scoring matrix, similar to the BLOSUM matrix, except that each amino acid is matched with the environmental classes. 10 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY The general features of this matrix are predictable, but the details can sometimes be hard to understand because of the complexity of protein folding and the crudeness of the environmental class descriptiors. 11 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY Given the scoring matrix, any particular sequence can be tested against the database of known protein structures to see if it matches any of the environmental profiles. The score for the test sequence is compared with scores obtained for all other sequences when compared to a particular structure. e.g., testing thousands of sequences against the 3D-1D environmental profile for myoglobin: This overlap indicates that there will be globin folds that are undetected by the 3D1D profile method. 12 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2005 ONLY 1. Energy and Enthalpy THE FIRST LAW OF THERMODYNAMICS We are often interested in how energy is stored in molecular systems, and how it is released when needed. e.g. High energy phosphate bond of energy is hydrolyzed to release energy: We study such reactions by carrying them out in test tubes and measuring the heat released (or taken up). In order to get a clear understanding of energy transfer it is important to define the nature of the experimental system with respect to energy transfer and material (atoms, molecules) transfer. First, some bookkeeping. 13 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2006 ONLY OPEN SYSTEM molecules and energy move in and out - useful for studying diffusion and other transport phenomena CLOSED SYTEM matter (molecules) stay inside, energy moves in and out we usually study chemical reactions in such systems ISOLATED OR ADIABATIC SYSTEM matter and energy stay inside, no transfer from surroundings 14 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2006 ONLY When an exothermic reaction occurs inside the system, energy is released: e.g. 2H2 + O2 → 2H2O This reaction releases 280 kJ/mole of energy Where does this energy go? In an open or closed system, energy is exchanged with the surroundings until the system reaches equilibrium. Energy transfer occurs either as HEAT, or WORK. When energy is transferred to the surroundings as heat, this stimulates disordered motion (random motion) of molecules in the surroundings: 15 Restricted: For students enrolled in Chem130/MCB100A, UC Berkeley, Fall 2006 ONLY When the system does work on the surroundings, it stimulates the ORDERED MOVEMENT of some part of the surroundings: 16 ...
View Full Document

This note was uploaded on 01/12/2010 for the course MCB 100A taught by Professor Kuryian during the Fall '09 term at University of California, Berkeley.

Ask a homework question - tutors are online