Last name:_ First name:_ SID#:_
Collaborators:_
CMPUT 366 Written 3: Solving MDPs
Due: Tuesday Oct 6 in 366 dropbox, first floor CSC by 2pm (no slip days)
Policy: Can be solved in groups (acknowledge collaborators) but must be written up individually
Ther

Chapter 4: Dynamic Programming
Objectives of this chapter:
Overview of a collection of classical solution methods
for MDPs known as dynamic programming (DP)
Show how DP can be used to compute value functions,
and hence, optimal policies
Discuss efficie

Unified View
width
of backup
Temporaldifference
learning
Dynamic
programming
height
(depth)
of backup
Exhaustive
search
Monte
Carlo
.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Chapter 8: Planning and Learning
Objectives of this

Chapter 5: Monte Carlo Methods
the value of state is expected value of return
Monte Carlo methods are learning methods
Experience values, policy
Monte Carlo methods can be used in two ways:
! model-free: No model necessary and still attains optimality
!

Chapter 6: Temporal Difference Learning
Objectives of this chapter:
Introduce Temporal Difference (TD) learning
Focus first on policy evaluation, or prediction, methods
Compare efficiency of TD learning with MC learning
Then extend to control methods
R. S

We want to find policy and we will go to model and talk about experience
Unified View
width
of backup
Temporaldifference
learning
Dynamic
programming
EXPERIENCE
MODEL
LEARNING
MODEL
DIRECT
REINFORCEMENT
LEARNING
height
(depth)
of backup
Multi-step
bootstr

there is only two policies
whic are either A nd B path
or A nd C path. Which
one is better? Second one
when gama is 0 we can the first path of
each side. In this case the left one is
more optimal.
with gama 99 the right side obviously
is optimal.

Chapter 9:
On-policy Prediction
with Approximation
3 waves of neural networks
First explored in the 1950-60s: Perceptron, Adaline
Revived in the 1980-90s as Connectionism, Neural Networks
only one learnable layer
exciting multi-layer learning using backpr

Probabilities and
Expectations
A. Rupam Mahmood
September 10, 2015
Probabilities
Probability is a measure of uncertainty
Being uncertain is much more than I dont know
We can make informed guesses about uncertain
events
Intelligent Systems
An intelligent s

The Problem of Temporal Abstraction
How do we connect the high level to the low-level?
"
the human level to the physical level?
"
the decide level to the action level?
MDPs are great, search is great,
excellent repns of decision-making, choice, outcome
bu

Question 1, part 5: Empirical search for best step size
by Dylan Ashley
What you have learned from bandits
The need to tradeoff exploitation and exploration, e.g., by an -greedy policy
The difference between a sample, an estimate, and a true expected valu

What we learned last time
1. Intelligence is the computational part of the ability to achieve goals
looking deeper: 1) its a continuum, 2) its an appearance, 3) it varies
with observer and purpose
2. We will (probably) figure out how to make intelligent

optimal action is the thing that has highest expectation as reward
the greedy action is not same as optimal action. Greedy action is estimate the true value of optimal action
choosing of a random action.
3/4 because half the time you will choose greedy ac

Chapter 3: The Reinforcement Learning Problem
(Markov Decision Processes, or MDPs)
Objectives of this chapter:
present Markov decision processesan idealized form of
the AI problem for which we have precise theoretical
results
introduce key components of

CMPUT 366:
Intelligent Systems
Instructor: Rich Sutton
Dept of Computing Science
University of Alberta
richsutton.com
1
Intelligent Systems
Introduction to Artificial Intelligence (AI)
Introduction to the Science and Technology of
Mind
touches on control

Last name:_ First name:_ SID#:_
Collaborators:_
CMPUT 366 Written 2: Markov Decision Processes 1
Due: Thursday Sept 24 in 366 dropbox, first floor CSC, by 2pm (no slip days)
Policy: Can be solved in groups (acknowledge collaborators) but must be written u

Last name:_ First name:_ SID#:_
Collaborators:_
CMPUT 366 Written 5: Planning, Learning, and Exam Practice
Due: Tuesday Oct 20 in 366 dropbox, first floor CSC, by 2pm (no slip days)
Policy: Can be solved in groups (acknowledge collaborators) but must be w

Last name:_ First name:_ SID#:_
Collaborators:_
CMPUT 366 Written 4: Temporal-Difference Learning
Due: Tuesday Oct 13 in 366 dropbox, first floor CSC, by 2pm, or in class (no slip days)
Policy: Can be solved in groups (acknowledge collaborators) but must

Last name:_ First name:_ SID#:_
Collaborators:_
CMPUT 366 Written 1: Step sizes & Bandits
Due: Tuesday Sept 15 in the 366 dropbox in CSC by 2pm (no slip days)
Policy: Can be discussed in groups (acknowledge collaborators) but must be written up individual

Multi-arm Bandits
Sutton and Barto, Chapter 2
You are the algorithm! (bandit1)
Action 1 Reward is always 8
value of action 1 is
q (1) = 8
Action 2 88% chance of 0, 12% chance of 100!
value of action 2 is
q (2) = .88 0 + .12 100 = 12
Action 3 Randomly

Deterministic Tree Search
aka Deterministic Tree-based Planning
aka Search
finding the shortest path from start state to goal state
Goals for today
! Learn (or remember) basic ideas of breadthfirst and depth-first search
! their strengths and weaknesses
!

Eligibility Traces
Unifying Monte Carlo and TD
key algorithms: TD(), Sarsa()
backups,
andtohenceforth
we call them one-step
TD of
methods.
methods
the
up-until-termination
backups
Monte
Car
estimated
valuespectrum
of the nth
next state,
discoun
Figure
7.1

Unified View
width
of backup
Temporaldifference
learning
Dynamic
programming
height
(depth)
of backup
Exhaustive
search
Monte
Carlo
.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Chapter 8: Planning and Learning
Objectives of this

Unified View
width
of backup
Temporaldifference
learning
Dynamic
programming
height
(depth)
of backup
Exhaustive
search
Monte
Carlo
.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Chapter 8: Planning and Learning
Objectives of this

The Entity-Relationship
Model
Davood Rafiei
1
Database Design Process
Real World!
Functional
Specifications
Database
Specifications
E-R Modeling!
Conceptual Schema
Design
Relational Model!
Mapping to DBMS
Data Model
Normal Forms!
Scheme Refinement

Last name:_ First name:_ SID#:_
Collaborators:_
CMPUT 366/499 Written 1: Step Sizes & Bandits
Due: Tuesday Sept 13 in Gradescope by 2pm (no slip days)
Policy: Can be discussed in groups (acknowledge collaborators) but must be written up individually.
Subm