MIT6_047f08_lec05_note05 - MIT OpenCourseWare...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
MIT OpenCourseWare 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1. Mitochondria play an important role in metabolic processes and are known as the “powerhouse” of the cell. They represent an ancient bacterial invader that has been successfully subsumed by eukaryotic cells. Mitochondria have their own genetic material, commonly referred to as mtDNA. Interestingly, a large fraction of inherited mitochondrial disorders are due to nuclear genes encoding specific proteins targeted to the mitochondria and not due to mutations in mtDNA itself. Hence it is imperative to identify these mitochondrial proteins to shed light on the molecular basis for these disorders. 6.047/6.878 Fall 2008 Lecture #5 1. Overview In the previous lecture we looked at examples of unsupervised learning techniques, such as clustering. In this lecture we focus on the use of supervised learning techniques, which use the inherent structure of the data to predict something about the state or “class” we should be in. There are two different approaches to classification which use either – (1) generative models that leverage probabilistic models that generate data, and (2) discriminative models that use an appropriate function to tease apart the data. Naïve Bayes’ classifiers are an example of generative models and Support Vector Machines (SVMs) are example of discriminative models. We will discuss actual biological applications of each of these models, specifically in the use of Naïve Baye’s classifiers to predict mitochondrial proteins across the genome and the use of SVMs for the classification of cancer based on gene expression monitoring by DNA microarrays. The salient features of both techniques and caveats of using each technique will also be discussed. 2. Classification – Bayesian Techniques We will discuss classification in the context of the problem of identifying mitochondrial proteins. If we look across the genome, how do we determine which proteins are involved in mitochondrial processes or more generally which proteins are targeted to the mitochondria 1 . This is particularly useful because if we know the mitochondrial proteins, we can start asking interesting questions about how these proteins mediate disease processes and metabolic functions. The classification of these mitochondrial proteins involves the use of 7 features for all human proteins – (1) targeting signal, (2) protein domains, (3) co- expression, (4) mass spectrometry, (5) sequence homology, (6) induction, and (7) motifs. In general, our approach will be to determine how these features are distributed for objects of different classes. The classification decisions will then be made using probability calculus. For our case with 7 features for each protein (or more generally, each object), it is likely that each object is associated with more than one feature. To simplify things let us consider just one feature and introduce the two key
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/24/2010 for the course EECS 6.047 / 6. taught by Professor Manoliskellis during the Fall '08 term at MIT.

Page1 / 10

MIT6_047f08_lec05_note05 - MIT OpenCourseWare...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online