LecturesPart28

LecturesPart28 - Computational Biology, Part 28 Automated...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright © 1996, 1999, 2000-2006. Copyright All rights reserved. Results Supervised learning of patterns 1. Create sets of images showing the location of many different proteins (each set defines one class class of pattern) of 2. Reduce each image to a set of numerical values 2. (“features”) that are insensitive to position and (“ ”) rotation of the cell rotation 3. Use statistical classification methods to “learn” 3. classification how to distinguish each class using the features how Boland & Murphy 2001 ER giantin LAMP Mito Nucleolin TfR Tubulin Actin gpp130 2D Images of 10 Patterns (HeLa) DNA Evaluating Classifiers s s s s s s Divide ~100 images for each class into training set Divide training and test set test set Use the training set to determine rules for the classes Use training Use the test set to evaluate performance Use test Repeat with different division into training and test Evaluate different sets of features chosen as most Evaluate discriminative by feature selection methods discriminative Evaluate different classifiers Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004 2D Classification Results True True Class Class Output of the Classifier DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub DNA 99 1 0 0 0 0 0 0 0 0 ER 0 97 0 0 0 2 0 0 0 1 Gia 0 0 91 7 0 0 0 0 2 0 Gpp 0 0 14 82 0 0 2 0 1 0 Lam 0 0 1 0 88 1 0 0 10 0 Mit 0 3 0 0 0 92 0 0 3 3 Nuc 0 0 0 0 0 0 99 0 1 0 Act 0 0 0 0 0 0 0 100 0 0 TfR 0 1 0 0 12 2 0 1 81 2 Tub 1 2 0 0 0 1 0 0 1 95 Overall accuracy = 92% Murphy et al 2003 Human Classification Results True True Class Class Output of the Classifier DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub DNA 100 0 0 0 0 0 0 0 0 0 ER 0 90 0 0 3 6 0 0 0 0 Gia 0 0 56 36 3 3 0 0 0 0 Gpp 0 0 54 33 0 0 0 0 3 0 Lam 0 0 6 0 73 0 0 0 20 0 Mit 0 3 0 0 0 96 0 0 0 3 Nuc 0 0 0 0 0 0 100 0 0 0 Act 0 0 0 0 0 0 0 100 0 0 TfR 0 13 0 0 3 0 0 0 83 0 Tub 0 3 0 0 0 0 0 3 0 93 Overall accuracy = 83% Computer vs. Human 100 90 80 70 60 Human Accuracy 50 40 40 50 60 70 Computer Accuracy 80 90 100 Velliste & Murphy 2002 3D HeLa cell images Nuclear ER Giantin gpp130 Lysosomal Mitoch. Nucleolar Actin Endosomal Tubulin Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins Velliste & Murphy 2002; Chen & Murphy 2004 3D Classification Results True True Class Class Output of the Classifier DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub DNA 98 2 0 0 0 0 0 0 0 0 ER 0 100 0 0 0 0 0 0 0 0 Gia 0 0 100 0 0 0 0 0 0 0 Gpp 0 0 0 96 4 0 0 0 0 0 Lam 0 0 0 4 95 0 0 0 0 2 Mit 0 0 2 0 0 96 0 2 0 0 Nuc 0 0 0 0 0 0 100 0 0 0 Act 0 0 0 0 0 0 0 100 0 0 TfR 0 0 0 0 2 0 0 0 96 2 Tub 0 2 0 0 0 0 0 0 0 98 Overall accuracy = 98% Unsupervised Learning to Identify High-Resolution Protein Patterns Location Proteomics s Tag many proteins x We have used CD-tagging We CD-tagging (developed by Jonathan Jarvik and (developed Jonathan Peter Berget): Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cell Jarvik s Isolate separate clones, each of which produces express one tagged Isolate clones each et al protein protein 2002 s s Use RT-PCR to identify tagged gene in each clone Use identify Collect many live cell images for each clone using spinning disk Collect confocal fluorescence microscopy confocal What Now? Group ~90 tagged clones by pattern Chen et al 2003; Chen and Murphy 2005 Solution: Group them automatically s s s s How? SLF features can be used to measure similarity of SLF protein patterns protein This allows us for the first time to create a This systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree Tree Start by grouping two proteins whose patterns are Start most similar, keep adding branches for less and less similar patterns less http://murphylab.web.cmu.edu/services/PSLID/tree.html Protein name Human description From databases Nucleolar Proteins Punctate Nuclear Proteins Predominantly Nuclear Proteins with Some Punctate Cytoplasmic Staining Nuclear and Cytoplasmic Proteins with Some Punctate Staining Uniform http://murphylab.web.cmu.edu/services/PSLID/tree.html Protein name Bottom: Visual Assignment to “known” locations Top: Automated Grouping and Assignment Refining clusters using temporal textures Incorporating Temporal Information s Time series images could be useful for x x s Need approach that does not require detailed Need understanding of the objects/organelles in which each protein is located protein x s Distinguishing proteins that are not distinguishable in static Distinguishing images images Analyzing protein movement in the presence of drugs, or Analyzing during different stages of the cell cycle during Generic Object Tracking approach? Not all proteins in discernible objects x Non-tracking approach needed Texture Features s Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. space x s These have been valuable for classifying static patterns. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time. time Temporal Textures based on Co-occurrence Matrix s Temporal co-occurrence matrix P: Nlevel by Nlevel matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). value s Thirteen statistics calculated on P are used Thirteen as features as Image at t0 4 1 3 2 3 2 2 4 2 3 2 4 4 3 3 2 1 4 3 2 Image at t1 4 1 2 2 4 Temporal co-occurrence matrix (for image that does not change) 4 1 3 2 3 1 2 3 4 2 2 4 2 3 1 3 0 0 0 2 4 4 3 3 2 0 9 0 0 2 1 4 3 2 4 1 2 2 4 3 0 0 6 0 4 0 0 0 7 Image at t0 4 1 3 2 3 2 2 4 2 3 2 4 4 3 3 Temporal co-occurrence matrix (for image that changes) 2 1 4 3 2 Image at t1 4 1 2 2 4 2 1 2 4 2 1 2 3 4 1 4 3 4 4 1 1 2 0 0 4 2 3 2 2 2 0 1 5 3 4 3 2 2 1 3 2 1 0 3 3 3 2 3 4 4 0 5 1 1 Implementation of Temporal Texture Features s T= 0s Compare image pairs with different time Compare interval ,compute 13 temporal texture features for each pair. each 45s s 90s 135s 180s 225s 270s 315s 360s 405s … Use the average and variance of features in each Use kind of time interval, yields 13*5*2=130 features kind Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns Results for temporal texture and static features Dia1 Dia1 Sdpr Atp5a1 Sdpr Atp5a1 Adfp timm23 50 9 0 35 87 5 5 0 95 0 4 0 10 0 0 0 5 2 0 92 8 4 88 Adfp 2 Timm23 0 Average Accuracy 85.1% Conclusion s Addition of temporal texture features Addition improves classification accuracy of protein locations locations Generative models of subcellular patterns Decomposing mixture patterns s Clustering or classifying whole cell patterns Clustering will consider each combination of two or more “basic” patterns as a unique new pattern pattern s Desirable to have a way to decompose Desirable decompose mixtures instead mixtures s One approach would be to assume that each One basic pattern has a recognizable combination of different types of objects different Object-based subcellular pattern models s Goals: x to be able to recognize “pure” patterns using to only objects only x to be able to recognize and unmix patterns to consisting of two or more “pure” patterns consisting x to enable building of generative models that can to synthesize patterns from objects: needed for systems biology systems Zhao et al 2005 Object type determination s Rather than specifying object types, we Rather chose to learn them from the data chose s Use subset of SLFs to describe objects s Perform k-means clustering for k from 2 to Perform -means 40 40 s Evaluate goodness of clustering using Evaluate Akaike Information Criterion Akaike s Choose k that gives lowest AIC Choose Unmixing: Learning strategy s Once object types are known, each cell in Once the training (pure) set can be represented as a vector of the amount of fluorescence for each object type each s Learn probability model for these vectors Learn for each class for s Mixed images can then be represented using Mixed mixture fractions times the probability distribution of objects for each class distribution 0.5 0.4 Pure Lysosomal Pattern 0.3 Amt fluor. 0.2 0.1 0 1 Golgi class 2 3 4 Lysosomal class 5 Object type 6 7 Nuclear class 8 0.5 Pure Golgi Pattern 0.4 0.3 Amt fluor. 0.2 0.1 0 1 Golgi class 2 3 4 Object type Lysosomal class 5 6 7 Nuclear class 8 0.25 50% mix of each 0.2 0.15 Amt fluor. 0.1 0.05 All Golgi class Lysosomal class Nuclear class 0 1 2 3 4 Object type 5 6 7 8 Two-stage Strategy for unmixing unknown image s Find objects in unknown (test) image, Find classify each object into one of the object types using learned object type classifier built with all objects from training images built s For each test image, make list of how often For each object type is found each s Find the fractions of each class that give Find “best” match to this list “best” Test of unmixing s Use 2D HeLa data s Generate random mixture fractions for eight Generate major patterns (summing to 1) major s Use these to synthesize “images” Use corresponding to these mixtures corresponding s Try to estimate mixture fractions from the Try synthesized images synthesized s Compare to true mixture fractions Zhao et al 2005 Results s Given 5 synthesized “cell images” with any Given mixture of 8 basic patterns mixture s Average accuracy of estimating the mixture Average coefficients is 83% coefficients Overview Object Detection objects Object type assigning Object types Real images P (objects | pattern) Object type modeling Statistical models P (objects | patterns, mix.coeff ) Generating images Object Detection objects Object type assigning Object types Real images P (image | pattern) P (image | patterns, mix.coeff ) Object type modeling Sampling Statistical models Generated images Generating objects and images Object Detection objects Object type assigning Object types Real images Object type modeling Sampling Object morphology modeling Statistical models Generated images LAMP2 pattern Cell membrane Nucleus Protein Nuclear Shape - Medial Axis Model width Rotate Represented by two curves the medial axis width along the medial axis Medial axis Synthetic Nuclear Shapes With added nuclear texture Cell Shape Description: Distance Ratio d1 d2 d1 + d 2 r= d2 Capture variation as a Capture principal components model model Generation Small Objects s Approximated by 2D Gaussian distribution Object Positions d1 d2 d2 r= d1 + d 2 Positions s Logistic regression P(r ) = s 1 1+ e − β 0 − β1r Generation x Each pixel has a weight according to the logistic model Fully Synthetic Cell Image Real Synthetic Conclusions and Future Work s Object-based generative models useful for Object-based communicating information about subcellular patterns subcellular s Work continues! Final word s Goal of automated image interpretation Goal should not be should x Quantitating intensity or colocalization x Making it easier for biologists to see what’s Making happening happening s Goal should be generalizable, verifiable, Goal mechanistic models of cell organization and behavior automatically derived from images derived ...
View Full Document

Ask a homework question - tutors are online