{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture10

# Lecture10 - CS221 Lecture notes Reinforcement learning II...

This preview shows pages 1–3. Sign up to view the full content.

CS221 Lecture notes Reinforcement learning II In the last lecture, we talked about reinforcement learning using the frame- work of discrete MDPs. In particular, we assumed a finite state space. To- day we discuss ways to handle problems where the state space is continuous. Specifically, we discuss two general methods for solving continuous MDPs. First, we show how to approximate the continuous value function in terms of a value function defined on a discrete set of points drawn from the con- tinuous space. Then, we introduce fitted value iteration, where we compute an approximation to the value function. Fitted value iteration allows us to scale MDP algorithms to much larger numbers of dimensions than would be possible using discretization. 1 Discretization Suppose you want to apply reinforcement learning to driving a car. Let’s say we model the state of the car as the vector s t = x t y t θ t R 3 , where x t and y t give the location of the car at time t , and θ gives its orien- tation. Assume for now that we have a finite set of actions A (e.g. turn left, step on the breaks, etc.). 1 We suppose we have a simulator which takes a state-action pair ( s t , a t ) for time t and returns a state s t +1 for time t + 1. In 1 In some problems, we can have continuous actions also. For instance, when driving, we can control how much to turn left, how strongly to step on the breaks, and so on. But for most problems, A has only a small number of degrees of freedom, and so it’s not hard to discretize. For instance, we might realistically model the state of a car as a vector 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 other words, it will sample s t +1 from the distribution P s t a t . We treat the sim- ulator as a black box which takes a state-action pair and returns a successor state: In some cases, we have a good model of the physics, and therefore we can determine the transition probabilities P s t a t through physical simulation. Often, however, it’s hard to construct a good physical model a priori. In these cases, the simulator itself has to be learned. Sometimes, the simulator is deterministic , in that it will always return the same state s t +1 for a particular state-action pair ( s t , a t ). Sometimes, the simulator is stochastic , where s t +1 is a random function of s t and a t . Say we have a continuous state space S . One way to apply the tech- niques from the previous lecture is to discretize S to obtain a discrete set of states ¯ S = { ¯ s (1) , ¯ s (2) , . . . , ¯ s ( N ) } . For instance, we might choose to break up the continuous state space S into boxes and let ¯ S be the points which lie at the center of the boxes. We will refer to this method as simple grid discretization . This is not the best approach, but often it is good enough. In this lecture, s will always denote a state in the original continous state, and ¯ s will denote one of the discrete states.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern