Lect14 - Next: Foundations and KR for ML Read Chapters 13...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Next: Foundations and KR for ML Read Chapters 13 and 14 Uncertainty, Statistics, Probabilistic Reasoning VERY APPROXIMATE Grades are posted on Compass G / UG curved separately (as announced) HW2 is not included Your position in distributions and honest assessment are more informative Is your behavior predictive of the whole course? 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Indirect / Direct RL Our TD RL is indirect RL Policy is constructed from world model Consider learning the policy directly Forgo learning the transition function T AKA Model based RL / Model free RL General distinctions in learning Full or joint model / Conditional model Generative model / Discriminative model 2
Background image of page 2
Q Learning: Direct RL Q function: Q: A x S   Q(a,s) - the expected utility of performing action a in state s The greedy policy is simpler: Recall model-based greedy policy: (Recall the need for exploration) 3 ) , ( max arg ) ( * * s a Q s a ' * * ) ' ( ) ' , , ( max arg ) ( s a s U s a s T s
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/13/2011 for the course CS 440 taught by Professor Levinson,s during the Fall '08 term at University of Illinois, Urbana Champaign.

Page1 / 16

Lect14 - Next: Foundations and KR for ML Read Chapters 13...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online