rl_exercise_sol - CS 6375 Machine Learning Reinforcement...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 6375 Machine Learning Reinforcement Learning Exercise [from Dan Klein, adapted from Sutton & Barto] A cleaning robot must vacuum a house on battery power. It can at any time either clean the house, wait and do nothing, or recharge its battery. Unfortunately, it can only perceive its battery level (its state) as either high or low. If the robot recharges, the battery return to high, and the robot receives a reward of 0. If the robot waits, the battery level is unchanged, and the robot receives a reward of R wait . If the robot cleans, the outcome will depend on the battery level. If the battery level is high, the battery drops to low with fixed probability 1/3. If the battery level is low, it runs out with probability 1/2. If the battery does not run out, the robot receives a reward of R clean . If the battery does run out, a human must collect the robot and recharge it. In this case, the robot ends up with a high battery, but receives a reward of -10. Note that the robot receives rewards not based on its current state (as in the book), but
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/25/2012 for the course CS 6375 taught by Professor Yangliu during the Spring '09 term at University of Texas at Dallas, Richardson.

Page1 / 2

rl_exercise_sol - CS 6375 Machine Learning Reinforcement...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online