383-Fall11-Lec22-addendum

383-Fall11-Lec22-addendum - Reinforcement Learning for HW...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CMPSCI 383 Nov 29, 2011 Reinforcement Learning for HW 5
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Today ʼ s lecture Active agents The exploration/exploitation dilemma Q-Learning
Background image of page 2
3 Active RL Agents Experience Build Utility Function Policy Select U Q π Predictions . . . Actions
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Interaction of policy and utility Policy Utility Function policy evaluation policy improvement utility learning “greedification” π U, Q
Background image of page 4
5 What is Q? Action-value function Q ( s , a ) = Utility of doing action a in state s i.e.: Total amount of reward expected over the future if you do action a in state s and thereafter select optimal actions. U ( s ) = max a Q ( s , a ) The utility of a state is the utility of doing the best action from that state:
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Learning an action-value function Q-Learning directly assigns a Q-value, Q(s,a), to each [state,action] pair. Don ʼ t need to learn transition probabilities to decide on best action: π *( s ) = argmax a Q ( s , a )
Background image of page 6
7 Bellman Equation for Q functions Q ( s , a ) = R ( s ) + γ P ( ʹ s | s , a )max ʹ a ʹ s Q ( ʹ s , ʹ a ) Recall Bellman Equation for U :
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 11

383-Fall11-Lec22-addendum - Reinforcement Learning for HW...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online