Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 1 Notes: Optimality
1
Markov decision processes
In this class we will study discretetime stochastic systems. We can describe the evolution
(dynamics) of these systems by the following equation, which we call the system equation:
xt+1= f(xt , at ,
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 2 Notes: Decision Processes
1 Summary: Markov Decision Processes
Markov decision processes can be characterized by (S,A, g (), P (,), where
S denotes a finite set of states
Ax denotes a finite set of actions for state x S
ga(x) denotes the finite
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
L
Lecture 5 Notes: Discounted Functions
In this lecture, we will show that optimal policies for discountedcost problems with large enough discount factor
are also optimal for averagecost problems. The analysis will also show that, if the optimal average
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 4 Notes: Average Cost
In the average cost problems, we aim at finding a policy u which minimizes
T 1
lim
sup 1
Ju(x) = T T E gu(xt)x0 = 0 .
t=0
(1)
Since the state space is finite, it can be shown that the lim sup can actually be replaced with lim
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 9 Notes: Joint Distribution
In this lecture, we will consider the problem of supervised learning. The setup is as follows. We
have pairs (x, y), distributed according to a joint distribution P (x, y). We would like to describe
the relationship bet
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 6 Notes: Multiclass Networks
In the first part of this lecture, we will discuss the application of dynamic programming to the
queueing network introduced in [1], which illustrates several issues encountered in the
application of dynamic pro gramm
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 10 Notes: Bellman Error
We now consider the problem of computing an appropriate parameter r, so that, given an
approximation
architecture
J(x, r), J(, r) J ().
A class of iterative methods are the socalled temporaldierence learning algorithms,
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 8 Notes: Contingencies
In this lecture, we want to study the convergence of
rt+1= rt+ tS(rt, wt)
to some with E [S(r , wt)] = 0. Recall the Lyapunov function analysis in deterministic case that we pick
a function V (r) such that
V (r)0, r,
T
V (
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 3 Notes: Stationary Policy
Using value iteration, starting at an arbitrary J0, we generate a sequence of cfw_Jk by
Jk+1= T Jk , integer k 0.
We have shown that the sequence JkJ as k, and derived the error bounds
k
JkJ  J0J 
Recall that the gre
Introduction to Engineering Concepts and Mathematics
ENGR 109

Spring 2006
Lecture 7 Notes: RTVI Algorithms
Recall the realtime value iteration (RTVI) algorithm
choose xk+1= f(xk , uk, wk)
choose ut in some fashion
Jk+1(x) = (T Jk)(x), x
update Jk+1(xk) = (T Jk)(xk),
We thus have
T J (x ) =
k k
a
mi
n
= xk
y a k k
g (x ) +
P