rl-4up - 1 MachineLearning CS6375---Fall 2010 a...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 MachineLearning CS6375---Fall 2010 a Reinforcement Learning Reading: Chapter 21, R&N Sections 13.1-13.2, 13.5, Mitchell 2 Learning Delayed Rewards All you can see is a series of states and rewards: S 1 (r=0) S 2 (r=0) S 3 (r=4) S 2 (r=0) S 4 (r=0) S 5 (r=0) Task : Based on this sequence, estimate J*(S 1 ),J*(S 2 )J*(S 6 ) 3 Idea 1: Supervised Learning Assume = 0.5. S 1 (r=0) S 2 (r=0) S 3 (r=4) S 2 (r=0) S 4 (r=0) S 5 (r=0) At t=1 we were in state S 1 and eventually got a long term discounted reward of 0+ 0+ 2 4+ 3 0+ 4 0= 1 At t=2 in state S 2 ltdr= 2 At t=5 in state S 4 ltdr= 0 At t=3 in state S 3 ltdr= 4 At t=6 in state S 5 ltdr= 0 At t=4 in state S 2 ltdr= 0 4 Supervised Learning Algorithm Watch a trajectory S[0] r[0] S[1] r[1] S[T]r[T] For t=0,1, T , compute Compute Youre done! 5 Online Supervised Learning Algorithm Initialize: Count[S i ] = 0 S i SumJ[S i ] = 0 S i Eligibility[S i ] = 0 S i Observe: When we experience S i with reward r do this: j Elig[S j ] Elig[S j ] Elig[S i ] Elig[S i ] + 1 j SumJ[S j ] SumJ[S j ] + r Elig[S j ] Count[S i ] Count[S i ] + 1 Then at any time, J est (S j )= SumJ[S j ]/Count[S j ] 6 Online Supervised Learning Economics Given N states S 1 S N , OSL needs O(N) memory. Each update needs O(N) work since we must update all Elig[ ] array elements Idea: Be sparse and only update/process Elig[ ] elements with values > for tiny Easy to prove: 7 Online Supervised Learning Lets grab OSL off the street, bundle it into a black van, take it to a bunker and interrogate it under 600 Watt lights. S 1 (r=0) S 2 (r=0) S 3 (r=4) S 2 (r=0) S 4 (r=0) S 5 (r=0) T 8 Certainty-Equivalent (CE) Learning Idea: Do model-based learning (i.e., use your data to estimate the underlying Markov system, instead of trying to estimate J directly. S 1 (r=0) S 2 (r=0) S 3 (r=4) S 2 (r=0) S 4 (r=0) S 5 (r=0) Estimated Markov System : You draw in the transitions +probs Whatre the estimated J values? 9 C.E. Method for Markov Systems Initialize: Count[S i ] = 0 #Times visited S i SumR[S i ] = 0 S i,, S j Sum of rewards from S i Trans[S i ,S j ] = 0 #Times transitioned from S...
View Full Document

Page1 / 9

rl-4up - 1 MachineLearning CS6375---Fall 2010 a...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online