{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture10 - CS221 Lecture notes Reinforcement learning II...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS221 Lecture notes Reinforcement learning II In the last lecture, we talked about reinforcement learning using the frame- work of discrete MDPs. In particular, we assumed a finite state space. To- day we discuss ways to handle problems where the state space is continuous. Specifically, we discuss two general methods for solving continuous MDPs. First, we show how to approximate the continuous value function in terms of a value function defined on a discrete set of points drawn from the con- tinuous space. Then, we introduce fitted value iteration, where we compute an approximation to the value function. Fitted value iteration allows us to scale MDP algorithms to much larger numbers of dimensions than would be possible using discretization. 1 Discretization Suppose you want to apply reinforcement learning to driving a car. Let’s say we model the state of the car as the vector s t = x t y t θ t R 3 , where x t and y t give the location of the car at time t , and θ gives its orien- tation. Assume for now that we have a finite set of actions A (e.g. turn left, step on the breaks, etc.). 1 We suppose we have a simulator which takes a state-action pair ( s t , a t ) for time t and returns a state s t +1 for time t + 1. In 1 In some problems, we can have continuous actions also. For instance, when driving, we can control how much to turn left, how strongly to step on the breaks, and so on. But for most problems, A has only a small number of degrees of freedom, and so it’s not hard to discretize. For instance, we might realistically model the state of a car as a vector 1
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 other words, it will sample s t +1 from the distribution P s t a t . We treat the sim- ulator as a black box which takes a state-action pair and returns a successor state: In some cases, we have a good model of the physics, and therefore we can determine the transition probabilities P s t a t through physical simulation. Often, however, it’s hard to construct a good physical model a priori. In these cases, the simulator itself has to be learned. Sometimes, the simulator is deterministic , in that it will always return the same state s t +1 for a particular state-action pair ( s t , a t ). Sometimes, the simulator is stochastic , where s t +1 is a random function of s t and a t . Say we have a continuous state space S . One way to apply the tech- niques from the previous lecture is to discretize S to obtain a discrete set of states ¯ S = { ¯ s (1) , ¯ s (2) , . . . , ¯ s ( N ) } . For instance, we might choose to break up the continuous state space S into boxes and let ¯ S be the points which lie at the center of the boxes. We will refer to this method as simple grid discretization . This is not the best approach, but often it is good enough. In this lecture, s will always denote a state in the original continous state, and ¯ s will denote one of the discrete states.
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern