{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# rl_exercise - CS 6375 Machine Learning Reinforcement...

This preview shows page 1. Sign up to view the full content.

CS 6375 Machine Learning Reinforcement Learning Exercise [from Dan Klein, adapted from Sutton & Barto] A cleaning robot must vacuum a house on battery power. It can at any time either clean the house, wait and do nothing, or recharge its battery. Unfortunately, it can only perceive its battery level (its state) as either high or low. If the robot recharges, the battery return to high, and the robot receives a reward of 0. If the robot waits, the battery level is unchanged, and the robot receives a reward of R wait . If the robot cleans, the outcome will depend on the battery level. If the battery level is high, the battery drops to low with fixed probability 1/3. If the battery level is low, it runs out with probability 1/2. If the battery does not run out, the robot receives a reward of R clean . If the battery does run out, a human must collect the robot and recharge it. In this case, the robot ends up with a high battery, but receives a reward of -10.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Ask a homework question - tutors are online