microarray - Study of Biological Processes using Microarray...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Study of Biological Processes using Microarray Gene Expression Data Microarray Ying Xu (徐鹰) Questions Questions • Which genes are involved in conversion of cellulose to Which ethanol in E. coli? E. • Which genes are involved in photosynthesis in rice? • Which genes are possibly involved in progression of Which gastric cancer? gastric these and many other similar questions about cellular processes could possibly be answered by using microarrays chips … Microarray Gene Expression Chips Microarray Microarray: miniaturized array having up to tens of thousands of singlestranded DNA attached to it Microarray assays are based on hybridization of a single-stranded DNA labeled with a fluorescent tag to a complementary molecule attached to the chip When each spot in a microarray is attached a unique DNA molecule, it can be used to detect presence/absence and concentration of a particular type of DNA molecule in test tube NH3 NH3 NH3 Microarray Gene Expression Chips Microarray • By detecting the quantity of fluorescent molecules attached to each spot, one can infer the relative abundance of the complementary mRNA molecules in solution Microarray Gene Expression Data Microarray • What information can we derive from microarray What gene expression data? gene Information Derivable from Chip Data Information • By observing chip data, one can infer which genes are highly expressed or not expressed, or in general the relative expression levels of all genes intensity genome sequence Inference: genes x, y, z are highly expressed under conditions W while genes a, b, c are not expressed Information Derivable from Chip Data Information • By comparing gene expression levels under two conditions, one can infer which genes’ expression levels are affected intensity A/B, A-B Diseased cell Normal cell genome sequence Inference: gene X is significantly more highly expressed in diseased cell than in normal cell; hence gene X could potentially serve s a marker of the disease – differentially expressed genes Information Derivable from Chip Data Information • By observing expression levels of two genes collected at different time after a particular stimulus, one can infer they have similar or different expression patterns Inference: genes with similar expression patterns might be functionally related, e.g., working in the same pathway – coexpressed genes -> co-regulated Information Derivable from Chip Data Information By comparing expression patterns of gene A collected when gene B is functioning and not functioning (e.g., knockout or mutation), one could possibly derive gene B’s effect on gene A Inference: genes A and B may interact directly or indirectly, or even B is the cause of A’s altered expression patterns – interaction or causality relationship Differentially Expressed Genes Differentially • Detection of differentially expressed genes – which genes show which different levels of expressions under different conditions different – many non-biological factors contribute to the intensity values of biological microarray data, e.g., background intensity microarray – gene expression levels are stochastic in nature – we should expect that gene are we they vary even without change of conditions they • To accurately detect which genes are differentially expressed, To multiple measurements for the same condition are needed multiple • {10, 20, 15, 25, 15} versus {18, 27, 15, 10, 15} {10, versus • {100, 120, 134, 90, 105} versus {40, 27, 32, 38, 41} {100, {40, Are they differentially expressed? Differentially Expressed Genes Differentially Various mathematical tools could be used to determine if a gene is differentially expressed under different conditions – a simple one: calculate the averages and the ratio between them – fold changes • ave1 = 17 versus ave2 = 17 => no change • ave1 = 110 versus ave2 = 36 => 4-fold change – T-test: consider averages and standard deviations in the context of stochastic processes • Provides a confidence level for a predicted differentially expressed gene Differentially Expressed Genes Differentially Have two sets of samples, one for diseased and one for control, check for each gene if the gene has different expression patterns for the diseased vs control samples Collect all such genes, and find the ones with the best discerning power between the two groups Co-Expressed Genes Co • To determine which genes have similar/correlated expression patterns – to derive their functional relationships genes time course How do we do it if this is the given data? Co-Expressed Genes Co Data clustering – We can represent each gene as a vector (5, 15, 10, 7, 5, 3) – So a set of expression data can be represented as a collection of data points in K-dimensional space – Genes with similar expression patterns form data clusters Co-Expressed Genes Co Prim’s algorithm – step 1: select an arbitrary node as the current tree – step 2: find an external node that is closest to the tree, and add it with its corresponding edge into tree – step 3: continue steps 1 and 2 till all nodes are connected in tree. 4 8 4 4 7 5 3 7 10 4 4 7 7 3 3 6 (a) (b) (c) (d) (e) 5 Co-Expressed Genes Co Identification of “co-expressed” genes can be done through identifying “dense” clusters in a noisy background, which could be done through identification of “deep” valleys in a 2-dimensional plot Bii-Clustrering B for Gene Expression Analysis • In real applications, often we need to identify co- expressed genes under some (but not all) conditions • This problem can be solved using the so-called biclustering data analysis Example Question #1 Example • Which genes might be involved in the conversion of Which cellulose to ethanol in E. coli? E. • Answer: – microarray chips on E. coli with and without treatment of microarray cellulose cellulose – iidentification of differentially expressed genes under the two dentification conditions conditions – rule out genes responsive to general changes in food availability Example Question #2 Example • Which genes are possibly involved the photosynthesis in Which cyanobacterial WH8102? cyanobacterial • Answer: – microarray chips on cyanobacterial WH8102 with and without microarray cyanobacterial WH8102 exposure to light exposure – iidentification of differentially expressed genes under the two dentification conditions conditions – rule out genes responsive to general changes in environment Example Question #3 Example • One metabolic pathway is well studied with all its key One elements identified except for one enzyme. How can we possibly identify this “missing” enzyme? enzyme? • Answer: – identification of genes co-expressed with identified genes in the expressed pathway under relevant conditions pathway – iidentification of genes sharing conserved motifs with those of dentification the identified genes of the pathway the – functional elucidation of the above identified genes – ……. Example Question #4 Example We have 100 pairs of cancer vs reference tissues with 15%, 20%, 50% and 15% stage I, II, III and IV cancer tissues from 100 patients, 70% being men and 30% being women. Can we possibly genetic markers for (1) stomach cancer in general; (2) early stomach cancer; and (3) gender-specific? Answer: Microarray experiments on the cancer and reference tissues Identification of differentially expressed genes between (early) cancer and reference tissues Identification of differentially expressed genes between cancer tissues from the reference tissues of male (female) patients Example Question #5 Example • ALL, MLL and AML are three subtypes of leukemia. ALL, Can we possibly identify gene signatures for these subtypes? • Answer: Challenging Issue Challenging • Improved techniques for identification of gene groups Improved with correlated gene expression patterns with Take-Home Message Take • Microarray is a powerful technique in studying biological Microarray processes/systems, which has revolutionized modern biology! biology! • Microarray data can provide differential expression, “coMicroarray regulation” and causality relationship among genes Homework Homework • Find one popular program for prediction of co-expressed expressed genes based on microarray expression data on the Internet. Explain its basic idea. • Describe how microarray data could be used in Describe conjunction with prediction of transcription factor binding sites to predict transcriptionally co-regulated transcriptionally regulated genes genes ...
View Full Document

This note was uploaded on 06/16/2011 for the course BIO 127 taught by Professor Xuyin during the Spring '10 term at Georgetown.

Ask a homework question - tutors are online