yCSc 401 – Data Mining Name:______________________________ Exam #2 October 30, 2001 Time exam started: ________________________ Time exam completed: ______________________ NOTE: Exam time should NOT exceed 2 hours and 30 minutes. Fax to: D. St. Clair 573-341-4501 St. Clair’s phone number for questions: 573-465-5963 (available 3:45 – 6:15 PM) You may add extra pages as needed. Indicate total # pages including this page: ___________________

CSc 401 – Data Mining Name:______________________________ Exam #2 October 30, 2001 Score:_________________/100 Directions: Carefully answer each of the following questions. This is an open-book, open-note exam. You may use calculators but you may NOT use computers. You are NOT to get help from others. Points will be assigned on answer quality as well as answer correctness. CLEARLY show all work. PUT YOUR NAME AT THE TOP OF EACH PAGE. 1. a.) Using Quinlan's entropy equations, calculate the gain for attributes hair and eyes in the data shown below. "Class" is the classification attribute. Clearly show all work. [10 pts.] class height hair eyes 1 neg short blond brown 2 neg tall dark brown 3 pos tall blond blue 4 neg tall dark blue 5 neg short dark blue 6 pos tall red blue 7 neg tall blond brown 8 pos short blond blue This was done on the first exam – see the jpg exam for an example b.) If the gain(height) = 0.004, indicate which of the three attributes you would choose as the root attribute of a C4.5 tree for this data. [5 pts.] Choose the attribute with the highest gain to be the root of the tree
