This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS221 Problem Set #2 1 CS 221, Autumn 2007 Problem Set #2: Machine Learning Due by 9:30am on Tuesday, October 30. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 7251449 with a filled out route form 1 as the cover page. We will not accept solutions by email or courier. 1 Written part (100 points) NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. 1. [22 points] Learning rules Consider a variant of the logistic regression model, defined as: g ( z ) = 1 1 + e z h ,v ( x ) = p ( y = 1  x ; , v ) = g parenleftBigg k summationdisplay i =1 i f i ( x ; v ) parenrightBigg . This model has two sets of parameters, and v . Assume that each of the functions f i ( x ; v ) is a decision tree, where each leaf is annotated with some realvalued number 2 v i, R . Thus, for the i th decision tree, f i ( x ; v ) = v i, ( x ) , where ( x ) is the leaf reached by the input feature vector x . We are given a training set { ( x (1) , y (1) ) , . . . , ( x ( m ) , y ( m ) ) } of m examples, and we want to use gradient ascent to train our logistic regression classifier to maximize the loglikelihood: ( , v ) = m summationdisplay i =1 log p ( y ( i )  x ( i ) ; , v ) . (a) [12 points] Derive the (batch) gradient ascent learning rule to train each weight v i, . Your learning rule should be of the form v i, := . . . and take into consideration the entire training set. (b) [7 points] Suppose we want to define our parameterization of the decision trees so that they share certain parameters. For example, we might decide that all leaves in any decision tree that occur at depth d or deeper have the same parameter. In general, let v be some parameter, and let L be the set of pairs ( i, ) of all leaves in all trees that are associated with the shared parameter v . Derive the (batch) gradient ascent update rule for v . 1 Available from http://scpd.stanford.edu/scpd/students/routing.htm 2 Note that this is slightly different from what we had in class, where we had realvalued p [0 , 1]. CS221 Problem Set #2 2 (c) [3 points] Explain briefly but using formal terms why parameter sharing may be a good idea....
View
Full
Document
This note was uploaded on 11/30/2009 for the course CS 221 taught by Professor Koller,ng during the Winter '09 term at Stanford.
 Winter '09
 KOLLER,NG
 Artificial Intelligence, Machine Learning

Click to edit the document details