final2004-solution - Solution of Final Exam 10-701/15-781...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Solution of Final Exam : 10-701/15-781 Machine Learning Fall 2004 Dec. 12th 2004 Your Andrew ID in capital letters: Your full name: • There are 9 questions. Some of them are easy and some are more diFcult. So, if you get stuck on any one of the questions, proceed with the rest of the questions and return back at the end if you have time remaining. • The maximum score of the exam is 100 points • If you need more room to work out your answer to a question, use the back of the page and clearly mark on the front of the page if we are to look at what’s on the back. • You should attempt to answer all of the questions. • You may use any and all notes, as well as the class textbook. • You have 3 hours. • Good luck! 1 Problem 1. Assorted Questions ( 16 points) (a) [ 3.5 pts ] Suppose we have a sample of real values, called x 1 , x 2 , ..., x n . Each sampled from p.d.f. p ( x ) which has the following form: f ( x ) = ± αe-αx , if x ≥ , otherwise (1) where α is an unknown parameter. Which one of the following expressions is the maximum likelihood estimation of α ? ( Assume that in our sample, all x i are large than 1. ) 1). n ∑ i =1 log ( x i ) n 2). n max i =1 log ( x i ) n 3). n n ∑ i =1 log ( x i ) 4). n n max i =1 log ( x i ) 5). n ∑ i =1 x i n 6). n max i =1 x i n 7). n n ∑ i =1 x i 8). n n max i =1 x i 9). n ∑ i =1 x 2 i n 10). n max i =1 x 2 i n 11). n n ∑ i =1 x 2 i 12). n n max i =1 x 2 i 13). n ∑ i =1 e x i n 14). n max i =1 e x i n 15). n n ∑ i =1 e x i 16). n n max i =1 e x i Answer: Choose [7]. 2 (b) . [ 7.5 pts ] Suppose that X 1 , ..., X m are categorical input attributes and Y is categorical output attribute. Suppose we plan to learn a decision tree without pruning, using the standard algorithm. b.1 ( True or False-1.5 pts ) : If X i and Y are independent in the distribution that generated this dataset, then X i will not appear in the decision tree. Answer: False (because the attribute may become relevant further down the tree when the records are restricted to some value of another attribute) (e.g. XOR) b.2 ( True or False-1.5 pts ) : If IG ( Y | X i ) = 0 according to the values of entropy and conditional entropy computed from the data, then X i will not appear in the decision tree. Answer: False for same reason b.3 ( True or False-1.5 pts ) : The maximum depth of the decision tree must be less than m+1 . Answer: True because the attributes are categorical and can each be split only once b.4 ( True or False-1.5 pts ) : Suppose data has R records, the maximum depth of the decision tree must be less than 1 + log 2 R Answer: False because the tree may be unbalanced b.5 ( True or False-1.5 pts ) : Suppose one of the attributes has R distinct values, and it has a unique value in each record. Then the decision tree will certainly have depth 0 or 1 (i.e. will be a single node, or else a root node directly connected to a set of leaves) Answer: True because that attribute will have perfect information gain. If an attribute has perfect information gain it must split the records into ”pure” buckets which can be split no more....
View Full Document

This note was uploaded on 07/10/2009 for the course INFORMATIC Inf taught by Professor Lanzi during the Spring '09 term at Politecnico di Milano.

Page1 / 27

final2004-solution - Solution of Final Exam 10-701/15-781...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online