{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# ps2 - CS221 Problem Set#2 1 CS 221 Autumn 2007 Problem...

This preview shows pages 1–3. Sign up to view the full content.

CS221 Problem Set #2 1 CS 221, Autumn 2007 Problem Set #2: Machine Learning Due by 9:30am on Tuesday, October 30. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449 with a filled out route form 1 as the cover page. We will not accept solutions by email or courier. 1 Written part (100 points) NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. 1. [22 points] Learning rules Consider a variant of the logistic regression model, defined as: g ( z ) = 1 1 + e z h θ,v ( x ) = p ( y = 1 | x ; θ, v ) = g parenleftBigg k summationdisplay i =1 θ i f i ( x ; v ) parenrightBigg . This model has two sets of parameters, θ and v . Assume that each of the functions f i ( x ; v ) is a decision tree, where each leaf is annotated with some real-valued number 2 v i,ℓ R . Thus, for the i th decision tree, f i ( x ; v ) = v i,ℓ ( x ) , where ( x ) is the leaf reached by the input feature vector x . We are given a training set { ( x (1) , y (1) ) , . . . , ( x ( m ) , y ( m ) ) } of m examples, and we want to use gradient ascent to train our logistic regression classifier to maximize the log-likelihood: ( θ, v ) = m summationdisplay i =1 log p ( y ( i ) | x ( i ) ; θ, v ) . (a) [12 points] Derive the (batch) gradient ascent learning rule to train each weight v i,ℓ . Your learning rule should be of the form v i,ℓ := . . . and take into consideration the entire training set. (b) [7 points] Suppose we want to define our parameterization of the decision trees so that they share certain parameters. For example, we might decide that all leaves in any decision tree that occur at depth d or deeper have the same parameter. In general, let v be some parameter, and let L be the set of pairs ( i, ℓ ) of all leaves in all trees that are associated with the shared parameter v . Derive the (batch) gradient ascent update rule for v . 1 Available from http://scpd.stanford.edu/scpd/students/routing.htm 2 Note that this is slightly different from what we had in class, where we had real-valued p [0 , 1].

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CS221 Problem Set #2 2 (c) [3 points] Explain briefly but using formal terms why parameter sharing may be a good idea. 2. [23 points] Least Squares In lecture, we discussed a method for least squares linear regression using gradient descent applied to a cost function J ( θ ). It turns out it is actually possible to solve for the linear regression parameters θ analytically. In this problem, you will derive analytic solutions for some important special cases.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern