ps2 - CS221 Problem Set #2 1 CS 221, Autumn 2007 Problem...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS221 Problem Set #2 1 CS 221, Autumn 2007 Problem Set #2: Machine Learning Due by 9:30am on Tuesday, October 30. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449 with a filled out route form 1 as the cover page. We will not accept solutions by email or courier. 1 Written part (100 points) NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. 1. [22 points] Learning rules Consider a variant of the logistic regression model, defined as: g ( z ) = 1 1 + e z h ,v ( x ) = p ( y = 1 | x ; , v ) = g parenleftBigg k summationdisplay i =1 i f i ( x ; v ) parenrightBigg . This model has two sets of parameters, and v . Assume that each of the functions f i ( x ; v ) is a decision tree, where each leaf is annotated with some real-valued number 2 v i, R . Thus, for the i th decision tree, f i ( x ; v ) = v i, ( x ) , where ( x ) is the leaf reached by the input feature vector x . We are given a training set { ( x (1) , y (1) ) , . . . , ( x ( m ) , y ( m ) ) } of m examples, and we want to use gradient ascent to train our logistic regression classifier to maximize the log-likelihood: ( , v ) = m summationdisplay i =1 log p ( y ( i ) | x ( i ) ; , v ) . (a) [12 points] Derive the (batch) gradient ascent learning rule to train each weight v i, . Your learning rule should be of the form v i, := . . . and take into consideration the entire training set. (b) [7 points] Suppose we want to define our parameterization of the decision trees so that they share certain parameters. For example, we might decide that all leaves in any decision tree that occur at depth d or deeper have the same parameter. In general, let v be some parameter, and let L be the set of pairs ( i, ) of all leaves in all trees that are associated with the shared parameter v . Derive the (batch) gradient ascent update rule for v . 1 Available from http://scpd.stanford.edu/scpd/students/routing.htm 2 Note that this is slightly different from what we had in class, where we had real-valued p [0 , 1]. CS221 Problem Set #2 2 (c) [3 points] Explain briefly but using formal terms why parameter sharing may be a good idea....
View Full Document

This note was uploaded on 11/30/2009 for the course CS 221 taught by Professor Koller,ng during the Winter '09 term at Stanford.

Page1 / 6

ps2 - CS221 Problem Set #2 1 CS 221, Autumn 2007 Problem...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online