# hw1m - Consider an 8-Armed Bandit p r problem as f follow...

This preview shows pages 1–2. Sign up to view the full content.

Consider There ar defined r Reward_ Reward_ Reward_ Reward_ Reward_ 1) Im m co a b c) F th re (N 2) R a b c) d r an 8-Arm re eight leve random dis _lever1: N( _lever2: N( _lever4: N( _lever6: N( _lever8: N( mplement e method (Use onstant eps ) ε = 0.1 ) ε = 0.3 ) ε = 0.5 For each of t he whole pr ewards of 1 Note: As ini ሼ׊ࢇ|ࡽ ሺࢇሻ א Repeat part ) ε 0 = 0.5, ) ε 0 = 0.5, ) ε 0 = 0.5, d) ε 0 = 0.5, لوا يﺮﺳ ed Bandit p ers to pull. B tribution. T -2, 1) : Gau -1, 1), (2, 0.01), (0.5, 0.04), -0.3, 0.01) epsilon-gree e average re ilons. the paramet rocedure for 00 times ite itial action v א ࢁሺെ૚, ૚ሻሽ (1) by use o ࢚ା૚ ૚ା࢚ ࢚ା૚ ૚ା૙.૚ ࢚ା૚ ૚ା૙.૙૚ ࢚ା૚ ൌࢿ ି૙ ناﺮﻬﺗ اﺪ ﻦﻴﺷﺎﻣ " problem as f By pulling e The reward ussian distri Rewar Rewar Rewar dy action se wards) to so ter, run the r 100 times. erations of a

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 2

hw1m - Consider an 8-Armed Bandit p r problem as f follow...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online