This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS221 Lecture notes Reinforcement learning II In the last lecture, we talked about reinforcement learning using the frame work of discrete MDPs. In particular, we assumed a finite state space. To day we discuss ways to handle problems where the state space is continuous. Specifically, we discuss two general methods for solving continuous MDPs. First, we show how to approximate the continuous value function in terms of a value function defined on a discrete set of points drawn from the con tinuous space. Then, we introduce fitted value iteration, where we compute an approximation to the value function. Fitted value iteration allows us to scale MDP algorithms to much larger numbers of dimensions than would be possible using discretization. 1 Discretization Suppose you want to apply reinforcement learning to driving a car. Let’s say we model the state of the car as the vector s t = x t y t θ t ∈ R 3 , where x t and y t give the location of the car at time t , and θ gives its orien tation. Assume for now that we have a finite set of actions A (e.g. turn left, step on the breaks, etc.). 1 We suppose we have a simulator which takes a stateaction pair ( s t , a t ) for time t and returns a state s t +1 for time t + 1. In 1 In some problems, we can have continuous actions also. For instance, when driving, we can control how much to turn left, how strongly to step on the breaks, and so on. But for most problems, A has only a small number of degrees of freedom, and so it’s not hard to discretize. For instance, we might realistically model the state of a car as a vector 1 2 other words, it will sample s t +1 from the distribution P s t a t . We treat the sim ulator as a black box which takes a stateaction pair and returns a successor state: In some cases, we have a good model of the physics, and therefore we can determine the transition probabilities P s t a t through physical simulation. Often, however, it’s hard to construct a good physical model a priori. In these cases, the simulator itself has to be learned. Sometimes, the simulator is deterministic , in that it will always return the same state s t +1 for a particular stateaction pair ( s t , a t ). Sometimes, the simulator is stochastic , where s t +1 is a random function of s t and a t . Say we have a continuous state space S . One way to apply the tech niques from the previous lecture is to discretize S to obtain a discrete set of states ¯ S = { ¯ s (1) , ¯ s (2) , . . ., ¯ s ( N ) } . For instance, we might choose to break up the continuous state space S into boxes and let ¯ S be the points which lie at the center of the boxes. We will refer to this method as simple grid discretization . This is not the best approach, but often it is good enough....
View
Full
Document
This note was uploaded on 11/30/2009 for the course CS 221 taught by Professor Koller,ng during the Winter '09 term at Stanford.
 Winter '09
 KOLLER,NG
 Artificial Intelligence

Click to edit the document details