interexp - Experimental Techniques 2 • High-throughput...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Experimental Techniques 2 • High-throughput interaction detection • Yeast two-hybrid - pairwise • organisms as machines to learn about organisms • yeast, worm, fly, human,... • low intersection between repeated experiments • in vivo, but takes place inside the nucleus. • Estimated 50% FP rate •TAP-MS (co-immunoprecipitation) - complexes Tandem Affinity Purification Want to find interaction partners for protein encoded by this gene: 5’ (Puig et al, 2001) Add a tag to the end of its DNA sequence. 3’ “Protein A” from Staphylococcus aureus Binds to IgG protein Calmodulin binding peptide TOBACCO ETCH VIRUS PROTEASE TEV protease cleavage (cutting) site Glu-Asn-Leu-Tyr-Phe-Gln-Gly Fishing for Proteins IgG Grab with Immunoglobulin G protein IgG Wash contaminants cleave with TEV Co-complexed proteins Tagged protein Contaminants Calmodulin Tag may not be exposed Tag may change folding / binding properties Tag may change expression levels Retrieve with calmodulin beads Sequencing Proteins (Tandem Mass Spectrometry) Trypsin digestion Tandem Mass Spec y-ions A AA AAV AAVE AAVEK K EK VEK AVEK Se arc h Database of known or predicted spectra frequency of seeing given mass AAVEK b-ions Gavin et al, 2002 Results: 589 tagged proteins (78% of which returned some interaction partners) 232 complexes (grouping those with substantial overlap) Covering 1440 proteins Not binary interactions In this picture: edges mean complexes share a protein Gavin et al, 2006 - Larger scale TAP-MS: • 2006 update: - 2,760 unique proteins involved in some complex (60% of the proteome of yeast) - Reproducible: repeated experiment for 139 proteins, and 69% of retrieved proteins were common to both experiments. - 73% of the known complexes in MIPS (database) were found. - ~ 491 complexes (more about how this is defined later) Of which 257 were novel Simple ways to Convert to a Graph Goll & Uetz, 2006 Gavin et al, 2006 - Larger scale TAP-MS: Socio-affinity Index A(i,j) := Si,j|i=bait + Si,j|j=bait + Mi,j j i i j k i j m i j Si,j|i=bait ≈ ratio of # of times j was retrieved using i as bait, divided by the expected # of times, given how often j appears and how many preys i brings in. Mij ≈ ratio of # of times i and j both seen when using some other bait divided by the expected # of times, given how often i and j appear. Clustering and Cluster Ensembles • The clustering algorithm to find complexes: 1. Using A(i,j) as a similarity metric, cluster the proteins (using some algorithm: UPGMA, single linkage, complete linkage). 2. Use a threshold of similarity X to define clusters. 3. Subtract a penalty (e.g. 0.5, 1, or 2) from A(i,j) where i,j are in the same cluster and go to step 1. 4. Stop after between 2 and 10 iterations. • Note: algorithm is underspecified. So: repeat with many different choices of parameters, take clusters found with a set of parameters that resulted in > 70% coverage and accuracy. Isoforms & core and attachment proteins • • 5,488 different clusters => “isoforms” Group together similar clusters into “complexes:” abc abd ab afg fhk fhg xy xz nm • Cores = subsets seen in most of the clusters within one group (average size 3.1 ± 2.5) • Modules = pairs that were always together and seen in > 1 complex. • Attachments = proteins not in the core. f, g, h: % of pairs colocalized, same cellular function, conservation. % of pairs known from structures or Yeast 2 Hybrid TAP-MS vs. Yeast 2 Hybrid Yeast 2-hybrid: Pro: better at transient interactions (b/c they only have to happen long enough to “turn on” the reporter gene) Con: take place in nucleus (may be unnatural) Con: only binary interactions TAP-MS: Pro: can find higher-order interactions (> binary) Con: requires more stable interactions Adenosine 5'triphosphate Ho et al, 2002 Results: + ATP 725 yeast proteins chosen to be “bait”: # = Kinase e.g. serine / threonine; histidine; tyrosine Protein Function 100 Kinases 36 Phosphatases 86 DNA damage response 503 Other proteins 600 baits worked (~10% of yeast proteins) 493 specific baits 1,578 proteins involved in ≥ 1 interaction 3,617 interactions P P = Phosphatase Kinases / Phosphatases kinase: class of enzyme (protein) that adds a phosphate group to other molecules (usually a protein). phosphorylation: the process of adding a phosphate group (PO4) to a protein. Phosphorylation often changes the shape (conformation) of a protein, thereby turning it “on” or “off ”. For example, phosphoylation can make a hydrophobic residue hydrophilic. It is an important regulatory mechanism. Estimate: >30% of proteins are phosphorylated in humans 518 known kinases in human 122 known kinases in yeast ATP: adenosine-5’triphosephate Comparing TAP Experiments Goll & Uetz, 2006 quality is the degree to which d with the same functional catAccuracy = % of D also data sets, t a intercetiTns e ndeto in T s that proteins of broadly related ith each other. This correlation ide of the diagonal consist largely reference set is particularly well overlap of high-throughput data Coverage (%) in D Fraction of reference set covered by data s, and because the reference set is dance (Fig. 3) shows that most protein interaction data sets well have unknown biases itself. al, cluding tComparisons ) are heavily biased towards Von Mering et (in 2002 he curated complexes e are large differences between the proteins of high abundance. However, the two genetic approaches d when parameters are changed. ghest accuracy is achieved for 100 Purified an one method (Fig. 2). complexes (TAP) ent and valid ways to count and 10,907 for example, S-PCI study11, “trusted” only Purified complexes nd the co-purified proteinsYPD & interactions from were (HMS-PCI) In silico ong MIPS proteins in a purifiall the (T) 10 mRNApredictions correlated ncreases the accuracy (from 2 to Two methods expression .5 to 27.8% for TAP), but it is Synthetic Combined e in cCoverage Supplementary overage (see = % of T also lethality evidence 1 High-throughput yeast two-hybrid Three methods Raw data Filtered data Parameter choices 0.1 0.1 1 10 100 Accuracy (%) Fraction of data confirmed by reference set Combining methods again helps significantly. (But of 80,000 Figure 2 Quantitative comparison of interaction data sets. The various data sets are predicted interactions, only 2,400 against a reference in of 10,907than interactions, which are derived were seen set more trusted 1 benchmarked method.) from protein complexes annotated manually at MIPS17 and YPD24. Coverage and than 60% of the proteins in the accuracy are lower limits owing to incompleteness of the reference set. Each dot in the Von Mering estimate for # of interactions in yeast M = interactions seen more than once (2,400) 1/3 of them were previously known At the time: ~ 10,000 interactions known Therefore, expect 30,000 interactions total (Sprinzak et al estimate ~ 16,000) Transcription network, aka regulatory network: Transcription Factors = proteins that bind to DNA to activate or repress the nearby, downstream genes. the regulated gene might also be a transcription factor regulates gene gene leads to a directed graph ChIP-chip (ChIP-seq) Chromatin immunoprecipitation - chip TF Binds to DNA TF Cross-linked to DNA (covalent bonds) Cell is lysed, DNA fragmented Antibodies used to pull out proteinDNA complexes DNA is “read” using microarray or shortread sequencing Synthetic Lethality • Predicts a particular kind of functional interaction (“genetic interactions”) • “Synthetic” b/c manufactured mutations -A -B -A & -B = survive = survive = die pretty course measurement proteins A and B are likely to be involved in similar functions A B A & B are “redundant” or complementary (parallel pathways) Explanations c e d c e b a d b • Two copies of the same protein. a Complex abcde can function when a single one of its proteins is removed, but not if 2 are removed. • Complexes that can function without one of their constituent proteins. • Two “redundant” pathways. A B A & B are “redundant” or complementary (parallel pathways) • 3 pathways, where any 2 are required SSL network from 2001 8 query genes 4500 “array” genes (Tong et al., Science, 2001) ...
View Full Document

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Ask a homework question - tutors are online