Reinforcement Learning: Tutorial 5 (week from 3. 3. 2014)
1. How can particle filters
be used in the context
of robot localization?
2. The "art" of importance
sampling: We are
sampling P(x), which
may be not cover the
interesting aspect of the
game. It is

Reinforcement Learning: Tutorial 5 (week from 3. 3. 2014)
1. How can particle filters
be used in the context
of robot localization?
Particle filters sample a
probability distribution.
The dynamics of the
particles can be used to
represent the change of
th

Reinforcement Learning 2013/2014: Tutorial 7
1. RL learning aims at related tasks of optimising the value function, the policy and the behaviour based
on the reward signal. The rewards is used only locally (in simple RL algorithms), such that models are
u

Reinforcement Learning 2014: Tutorial 6 (week from 10. 3. 2014)
1. Recall the three types of errors in RL (hint: value, policy, exploration). How are they
represented in the definition of the global utility measure (or global reward average).
Is it possib

Reinforcement Learning 2014: Tutorial 6 (week from 10. 3. 2014)
These are just a few hints, please do not distribute.
1. Errors in value estimation, errors in policy estimation and "region errors", i.e. has
the agent arrived (sufficiently often/at all) in

Reinforcement Learning: Tutorial 4
(week from 24. 2. 2014)
1. Is the reinforcement learning framework adequate to usefully represent all goal-directed
learning tasks? Can you think of any clear exceptions?
[this and other problems on this sheet are from S

Reinforcement Learning 2013/2014
Tutorial 2 (week 4)
1. Discuss your solution of the 1D walker problem (see homework in lecture RL3,
21/1/2014). How do
initialisation
alternative reward denitions
exploration variants
parameters and parameter decay sch

Reinforcement Learning: Tutorial 4
Problems and hints for solutions for the week from 24. 2. 2014
1. Is the reinforcement learning framework adequate to usefully represent all goaldirected learning tasks? Can you think of any clear exceptions?
This proble

Reinforcement Learning 2013
Tutorial 3: Hints and solutions
1. Consider the gamblers problem (example 4.3 in S+B). Why does the optimal policy for the gamblers
problem have such a curious form? In particular, for capital of 50 it bets it all on one ip, bu

Reinforcement Learning 2013
Tutorial 2: Hints and solutions
1. Discuss your solution of the 1D walker problem (see homework in lecture RL3, 21/1/2014). How do
initialisation
alternative reward denitions
exploration variants
parameters and parameter de

Reinforcement Learning 2013/2014
Tutorial 3
1. Consider the gamblers problem (example 4.3 in S+B). Why does the optimal policy for the gamblers
problem have such a curious form? In particular, for capital of 50 it bets it all on one ip, but for
capital of

Reinforcement Learning
Tutorial 1 (week 3, 27/1/14 and 31/1/14)
Questions
1. Consider the comparison between greedy methods shown in Figure 2.1 in
the Sutton and Barto book. Which
method will perform best in the long run
in terms of cumulative rewards and

Reinforcement Learning
Tutorial 1 (week 4, 4/2/13 and 7/2/13)
Questions and Hints for Solutions
1. Consider the comparison between -greedy methods shown in Figure 2.1 in the Sutton
and Barto book. Which method will perform best in the long run in terms of

Reinforcement Learning: Coursework Assignment 2 (Semester 2, 2014)
Instructions
This homework assignment is to be done individually, without help from your classmates or
others. Plagiarism will be dealt with strictly as per University policy.
Solve all pr

Reinforcement Learning: Coursework Assignment 1 (Semester 2, 2013)
Instructions
This homework assignment is to be done individually, without help from your classmates or
others. Plagiarism will be dealt with strictly as per University policy.
Solve all pr