CS 6375 Machine LearningReinforcement Learning Exercise[from Dan Klein, adapted from Sutton & Barto]A cleaning robot must vacuum a house on battery power. It can at any time either clean the house, wait and do nothing, or recharge its battery. Unfortunately, it can only perceive its battery level (its state) as either high or low. If the robot recharges, the battery return to high, and the robot receives a reward of 0. If the robot waits, the battery level is unchanged, and the robot receives a reward of Rwait. If the robot cleans, the outcome will depend on the battery level. If the battery level is high, the battery drops to low with fixed probability 1/3. If the battery level is low, it runs out with probability 1/2. If the battery does not run out, the robot receives a reward of Rclean. If the battery does run out, a human must collect the robot and recharge it. In this case, the robot ends up with a high battery, but receives a reward of -10.
This is the end of the preview.
access the rest of the document.