CS 6375 Machine Learning
Homework 6
Due: 05/07/2008
1.
Reinforcement learning application. (15 pts)
Read a paper about using reinforcement learning for an application. Briefly summarize the
paper, and explain clearly the states, reward, and actions for the task.
2.
MDP.
(30 pts)
The following figure shows an MDP with N states. All states have two actions (north and right)
except Sn, which can only self-loop. As you can see from the figure, all state transitions are
deterministic. The discount factor is
γ
.
(a)
What is J*(Sn)?
(b)
What is the optimal policy?
(c)
What is J*(S
1
)?
(d)
Use value iteration to solve this MDP. What is J
1
(S
1
) and J
2
(S
1
) in the first and second
iteration respectively?
Hint: If you don’t remember the formula for summing up geometric series, you will need the
following one, where 0 <=
α
<1:

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
3.
Policy iteration. (25 pts)
Consider the following MDP with three states, with rewards -1, -2, 0 respectively. State 3 is the