CS 221, Autumn 2007
Problem Set #3 Programming Part — MDPs
Due: 11:59pm, Thursday November 15.
Overview
In this programming assignment, you’ll use the value iteration algorithm to find a policy for
driving a car on a loosesurface road. The car will begin facing down the road in one direction,
travelling at a fixed speed. Your policy will need to learn to spin the car around and then drive
off in the opposite direction as quickly as possible.
You are provided with a simple simulator of the car that, given a (realvalued) state vector, will
simulate forward in time using actions chosen by your policy. Since the simulator operates on
continuousvalued states and actions, we’ve discretized the statespace for you. In this discretized
space, you’ll use the simulator to collect data, building up a probabilistic model of the car’s
dynamics. Once you have such a model, you’ll compute the value function (using value iteration)
for the discrete MDP, and finally compute the optimal policy from the value function. You will
then save the policy to a file. These files can be loaded and viewed in the provided graphical
simulator so that you can see how your car performs.
You can copy the code from the cs221 directory on AFS:
cp r /afs/ir/class/cs221/code/pa3 .
Or download the tarzipped file from the course website. The graphical simulator requires GLUT.
We recommend that you work on the
myth
workstations, since these are known to have GLUT
correctly installed.
Please keep all of your code confined to main.cpp. The code necessary to complete
this assignment is not very long – you shouldn’t need any other files.
The Continuous State Model
The road on which the car is driving defines the X axis of the coordinate system (which you can
think of as East). The Yaxis points to the left side of the road (North) if the car is facing in
the positive X direction. Angles are measured counterclockwise from the X axis, so if the car’s
heading is 0, then it is facing directly down the road, and if its heading is
π
, then it is facing
CS221 Problem Set #3 Programming Part
down the road in the opposite direction.
(1)
The state of the car,
s
, can be described by an array of 4 numbers:
s
= [
y, v
x
, v
y
, θ
].
y
is the Y
position of the car relative to the road.
v
x
and
v
y
are the car’s velocity along the X and Y axes,
and
θ
is the car’s heading.
In each state, one may choose an action for the car to execute.
The action is an array of 2
numbers:
a
= [
α, w
].
α
is the steering angle (i.e., the angle of the tires relative to the car’s
centerline – positive angles cause a left turn, negative angles cause a right turn), and
w
is the
“velocity” of the car’s wheels. As an example, if
a
= [0
,
10], then the car will drive straight, with
the wheels spinning at a rate such that they propel the car at 10m/s (i.e., they spin at a rate
of
ω
= 10
/r
radians/sec, where
r
is the radius of the wheel). The car has an infinite amount
of torque, and will immediately drive the wheels at whatever velocity is commanded. Choosing
w
≤
0 will resulting in applying the car’s brakes – you cannot drive backward.
