This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms . 6.047/6.878 Fall 2008 Problem Set 2 Due: September 24, 2008 at 8pm 1. Bayes Rule and Naive Bayes Classifier. In this problem we will familiarize ourselves with Bayes Rule and the use of Naive Bayes for classification. Assume that the probability of there being rain on any given day is 0.1, and that the probability of getting in a car accident is 0.05. (a) Assume these two probabilities are independent. What is the probability of there being a day that is both rainy and you get in a car accident? What is the conditional probability of getting into a car accident today, given that you already know today is rainy? (b) From vehicle accident records, assume that weve determined that the probability of it being a rainy day, given that a car accident was observed is 0.4. Using Bayes Rule and the prior probabilities, what is the conditional probability of getting into a car accident today, given that you already know today is rainy? (c) Why are the conditional probabilities you computed in parts (a) and (b) different? (d) The following table describes features for DNA sequences and the class of corresponding DNA sequence. GC content describes the fraction of Gs and Cs (as opposed to As and Ts) in the DNA sequence. Complexity describes the degree of randomness of the sequence. Repeats are stretches of DNA that are highly repetitive and occur multiple times in a genome. Motifs are sites where transcription factors bind. GC Content Length Complexity Class Low Long High Gene Low Long Low Gene High Long High Repeat Medium Short High Motif Medium Short Low Motif High Long Low Repeat High Short High Motif Medium Long High Gene High Long Low Repeat High Short High Motif Use the Naive Bayes classifier to predict this genes class ( show your work ): GC Content Length Complexity Class Medium Long Low ? 2. Bayesian Decision Theory. For many classification problems, our objective will be more complex than simply trying to minimize the number of misclassifications. Frequently different mistakes carry different costs. For example, if we are trying to classify a patient as having cancer or not, it can be argued that it is far more harmful to misclassify a patient as being healthy if they have cancer than to misclassify a patient as having cancer if they are healthy. In the first case, the patient will not be treated and would be more likely to die, whereas the second mistake involves emotional grief but no greater chance of loss of life. To formalize such issues, we introduce a loss function L kj . This function assigns a loss to the misclassification of an object as class j when the true class is class k. For instance, in the case of cancer classification L kj might look like (where the true class k is along the x axis): True Cancer True Normal Predict...
View Full
Document
 Fall '08
 ManolisKellis

Click to edit the document details