Reinforcement Learning: Tutorial 8 (revision)
(week from 23. 3. 2015)
This sheet contains a selection of exam questions from previous years. Please check also
questions of earlier tutorials.
MABs
Explain the -greedy action selection method with respect to
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
INFR11010 REINFORCEMENT LEARNING
Monday 2 nd May 2016
14:30 to 16:30
INSTRUCTIONS TO CANDIDATES
Answer QUESTION 1 and ONE other question.
Question 1 is COMPULSORY.
All questi
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
INFR11010 REINFORCEMENT LEARNING
Friday 24 th May 2013
09:30 to 11:30
MSc Courses
Convener: B. Franke
External Examiners: T. Attwood, R. Connor, R. Cooper, S. Denham, T. Norm
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
REINFORCEMENT LEARNING
Tuesday 26 th April 2011
09:30 to 11:30
MSc Courses
Convener: C. Stirling
External Examiners: T. Attwood, R. Connor, R. Cooper, D. Marshall, M. Richard
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
REINFORCEMENT LEARNING
Wednesday 9 th May 2012
14:30 to 16:30
MSc Courses
Convener: B. Franke
External Examiners: T. Attwood, R. Connor, R. Cooper, D. Marshall, M. Richardson
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
INFR11010 REINFORCEMENT LEARNING
Tuesday 13 th May 2014
14:30 to 16:30
MSc Courses
Convener: B. Franke
External Examiners: A. Burns, S. Denham, P. Healey, T. Norman
INSTRUCTI
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
REINFORCEMENT LEARNING
Thursday 29 th April 2010
09:30 to 11:30
MSc Courses
Convener: C. Stirling
External Examiners: R. Connor, R. Cooper, D. Marshall, T. Attwood
INSTRUCTIO
UNIVERSITY OF EDINBURGH
COLLEGE OF SCIENCE AND ENGINEERING
SCHOOL OF INFORMATICS
INFR11010 REINFORCEMENT LEARNING
Tuesday 28 th April 2015
14:30 to 16:30
INSTRUCTIONS TO CANDIDATES
Answer QUESTION 1 and ONE other question.
Question 1 is COMPULSORY.
All qu
Reinforcement Learning 2015/2016: Tutorial 7
1. [Model-based RL] RL learning aims at related tasks of optimising the value function, the policy and
the behaviour based on the reward signal. The rewards is used only locally (in simple RL algorithms),
such
Reinforcement Learning: Tutorial 6 (week from 7. 3. 2016)
1. Recall the three types of errors in RL (hint: value, policy, exploration). How are they
represented in the definition of the global utility measure (or global reward average).
Is it possible to
Reinforcement Learning: Tutorial 5 (week from 3. 3. 2014)
1. How can particle filters
be used in the context
of robot localization?
2. The "art" of importance
sampling: We are
sampling P(x), which
may be not cover the
interesting aspect of the
game. It is
Reinforcement Learning: Tutorial 5 (week from 3. 3. 2014)
1. How can particle filters
be used in the context
of robot localization?
Particle filters sample a
probability distribution.
The dynamics of the
particles can be used to
represent the change of
th
Reinforcement Learning 2013/2014: Tutorial 7
1. RL learning aims at related tasks of optimising the value function, the policy and the behaviour based
on the reward signal. The rewards is used only locally (in simple RL algorithms), such that models are
u
Reinforcement Learning 2014: Tutorial 6 (week from 10. 3. 2014)
1. Recall the three types of errors in RL (hint: value, policy, exploration). How are they
represented in the definition of the global utility measure (or global reward average).
Is it possib
Reinforcement Learning 2014: Tutorial 6 (week from 10. 3. 2014)
These are just a few hints, please do not distribute.
1. Errors in value estimation, errors in policy estimation and "region errors", i.e. has
the agent arrived (sufficiently often/at all) in
Reinforcement Learning: Tutorial 4
(week from 24. 2. 2014)
1. Is the reinforcement learning framework adequate to usefully represent all goal-directed
learning tasks? Can you think of any clear exceptions?
[this and other problems on this sheet are from S
Reinforcement Learning 2013/2014
Tutorial 2 (week 4)
1. Discuss your solution of the 1D walker problem (see homework in lecture RL3,
21/1/2014). How do
initialisation
alternative reward denitions
exploration variants
parameters and parameter decay sch
Reinforcement Learning: Tutorial 4
Problems and hints for solutions for the week from 24. 2. 2014
1. Is the reinforcement learning framework adequate to usefully represent all goaldirected learning tasks? Can you think of any clear exceptions?
This proble
Reinforcement Learning 2013
Tutorial 3: Hints and solutions
1. Consider the gamblers problem (example 4.3 in S+B). Why does the optimal policy for the gamblers
problem have such a curious form? In particular, for capital of 50 it bets it all on one ip, bu
Reinforcement Learning 2013
Tutorial 2: Hints and solutions
1. Discuss your solution of the 1D walker problem (see homework in lecture RL3, 21/1/2014). How do
initialisation
alternative reward denitions
exploration variants
parameters and parameter de
Reinforcement Learning 2013/2014
Tutorial 3
1. Consider the gamblers problem (example 4.3 in S+B). Why does the optimal policy for the gamblers
problem have such a curious form? In particular, for capital of 50 it bets it all on one ip, but for
capital of
Reinforcement Learning
Tutorial 1 (week 3, 27/1/14 and 31/1/14)
Questions
1. Consider the comparison between greedy methods shown in Figure 2.1 in
the Sutton and Barto book. Which
method will perform best in the long run
in terms of cumulative rewards and
Reinforcement Learning
Tutorial 1 (week 4, 4/2/13 and 7/2/13)
Questions and Hints for Solutions
1. Consider the comparison between -greedy methods shown in Figure 2.1 in the Sutton
and Barto book. Which method will perform best in the long run in terms of
Reinforcement Learning: Coursework Assignment 2 (Semester 2, 2014)
Instructions
This homework assignment is to be done individually, without help from your classmates or
others. Plagiarism will be dealt with strictly as per University policy.
Solve all pr
Reinforcement Learning: Coursework Assignment 1 (Semester 2, 2013)
Instructions
This homework assignment is to be done individually, without help from your classmates or
others. Plagiarism will be dealt with strictly as per University policy.
Solve all pr
Lecture 8 Dimensionality
Reduction
Contents:
Subset Selection & Shrinkage
Ridge regression, Lasso
PCA, PCR, PLS
Comparison of Methods
Lecture 8: RLSC - Prof. Sethu Vijayakumar
1
Data From Human Movement
Measure arm movement and full-body movement of
hum
RLSC Homework
You have been given example code (https:/db.tt/HUYqjOd1) that you may use to complete
this homework. The code allows you to load a kinematic model of a Baxter robot as well as to
compute the forward kinematics and Jacobian of the end-eectors