Unformatted text preview: 22 February 2007 CSE-4412(M) Midterm w/ answers p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: • Instructor: Parke Godfrey • Exam Duration: 75 minutes • Term: Winter 2007 Answer the following questions to the best of your knowledge. Be precise and be careful. The exam is closed-book and closed-notes. Write any assumptions you need to make along with your answers, whenever necessary. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated. If you need additional space for an answer, just indicate clearly where you are continuing. Regrade Policy • Regrading should only be requested in writing. Write what you would like to be reconsidered. Note, however, that an exam accepted for regrading will be reviewed and regraded in entirety (all questions). Grading Box 1. /10 2. /10 3. /10 4. /10 5. /10 Total /50 22 February 2007 CSE-4412(M) Midterm w/ answers p. 2 of 12 1. (10 points) Misc. Don’t eat the Smarties TM ! Calculation a. (3 points) John randomly takes a smartie from one of the bowls A , B , or C . (He randomly chose a bowl, then randomly chose a smartie.) We saw that the one he took, and then ate, was red . We also somehow know what the distribution of the smarties in the bowls was before John took one: A B C red 10 20 30 60 blue 40 30 20 90 50 50 50 150 We learn that, unfortunately, all the smarties in bowl A are poisoned! (The ones in B and C are fine.) What is the probability that John has been poisoned? State the conditional probability that represents this and calculate the probability as a fraction. P ( A | red) = P (red | A ) P ( A ) P (red) = 1 5 1 3 2 5 = 1 6 b. (3 points) We learn the following about students and the salaries— high is more than, or equal to, \$200k and low is less than \$200k—that they earn in their first jobs after graduating. Students who took the data mining course are represented by dm and those who did not by dm . low high dm 70 30 100 dm 830 70 900 900 100 1000 What is the lift of dm & high ? Calculate the number. P (dm ∧ high) P (dm) P (high) = 30 1000 100 1000 100 1000 = 3 22 February 2007 CSE-4412(M) Midterm w/ answers p. 3 of 12 c. (4 points) Consider that I 1 I 2 I 3 A B C A C D A C E A C F A D F A E F I 1 I 2 I 3 B C E B C F B D F C D E C E F D E F were the frequent 3-itemsets found by the Apriori algorithm. Show the 4-itemset pre-candidates that would be generated by the join step ....
