# hw6 - terminal state There are two possible actions a and b...

This preview shows pages 1–3. Sign up to view the full content.

CS 6375 Machine Learning Homework 6 Due: 05/07/2008 1. Reinforcement learning application. (15 pts) Read a paper about using reinforcement learning for an application. Briefly summarize the paper, and explain clearly the states, reward, and actions for the task. 2. MDP. (30 pts) The following figure shows an MDP with N states. All states have two actions (north and right) except Sn, which can only self-loop. As you can see from the figure, all state transitions are deterministic. The discount factor is γ . (a) What is J*(Sn)? (b) What is the optimal policy? (c) What is J*(S 1 )? (d) Use value iteration to solve this MDP. What is J 1 (S 1 ) and J 2 (S 1 ) in the first and second iteration respectively? Hint: If you don’t remember the formula for summing up geometric series, you will need the following one, where 0 <= α <1:

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
3. Policy iteration. (25 pts) Consider the following MDP with three states, with rewards -1, -2, 0 respectively. State 3 is the