This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Solution of Final Exam : 10701/15781 Machine Learning Fall 2004 Dec. 12th 2004 Your Andrew ID in capital letters: Your full name: • There are 9 questions. Some of them are easy and some are more diFcult. So, if you get stuck on any one of the questions, proceed with the rest of the questions and return back at the end if you have time remaining. • The maximum score of the exam is 100 points • If you need more room to work out your answer to a question, use the back of the page and clearly mark on the front of the page if we are to look at what’s on the back. • You should attempt to answer all of the questions. • You may use any and all notes, as well as the class textbook. • You have 3 hours. • Good luck! 1 Problem 1. Assorted Questions ( 16 points) (a) [ 3.5 pts ] Suppose we have a sample of real values, called x 1 , x 2 , ..., x n . Each sampled from p.d.f. p ( x ) which has the following form: f ( x ) = ± αeαx , if x ≥ , otherwise (1) where α is an unknown parameter. Which one of the following expressions is the maximum likelihood estimation of α ? ( Assume that in our sample, all x i are large than 1. ) 1). n ∑ i =1 log ( x i ) n 2). n max i =1 log ( x i ) n 3). n n ∑ i =1 log ( x i ) 4). n n max i =1 log ( x i ) 5). n ∑ i =1 x i n 6). n max i =1 x i n 7). n n ∑ i =1 x i 8). n n max i =1 x i 9). n ∑ i =1 x 2 i n 10). n max i =1 x 2 i n 11). n n ∑ i =1 x 2 i 12). n n max i =1 x 2 i 13). n ∑ i =1 e x i n 14). n max i =1 e x i n 15). n n ∑ i =1 e x i 16). n n max i =1 e x i Answer: Choose [7]. 2 (b) . [ 7.5 pts ] Suppose that X 1 , ..., X m are categorical input attributes and Y is categorical output attribute. Suppose we plan to learn a decision tree without pruning, using the standard algorithm. b.1 ( True or False1.5 pts ) : If X i and Y are independent in the distribution that generated this dataset, then X i will not appear in the decision tree. Answer: False (because the attribute may become relevant further down the tree when the records are restricted to some value of another attribute) (e.g. XOR) b.2 ( True or False1.5 pts ) : If IG ( Y  X i ) = 0 according to the values of entropy and conditional entropy computed from the data, then X i will not appear in the decision tree. Answer: False for same reason b.3 ( True or False1.5 pts ) : The maximum depth of the decision tree must be less than m+1 . Answer: True because the attributes are categorical and can each be split only once b.4 ( True or False1.5 pts ) : Suppose data has R records, the maximum depth of the decision tree must be less than 1 + log 2 R Answer: False because the tree may be unbalanced b.5 ( True or False1.5 pts ) : Suppose one of the attributes has R distinct values, and it has a unique value in each record. Then the decision tree will certainly have depth 0 or 1 (i.e. will be a single node, or else a root node directly connected to a set of leaves) Answer: True because that attribute will have perfect information gain. If an attribute has perfect information gain it must split the records into ”pure” buckets which can be split no more....
View
Full
Document
This note was uploaded on 07/10/2009 for the course INFORMATIC Inf taught by Professor Lanzi during the Spring '09 term at Politecnico di Milano.
 Spring '09
 Lanzi

Click to edit the document details