# ps10 - Massachusetts Institute of Technology 16.410...

This preview shows pages 1–3. Sign up to view the full content.

Massachusetts Institute of Technology 16.410 Principles of Automated Reasoning and Decision Making Problem Set #10 Due: Session 25 Learning, Control and Adversaries Objectives In this problem set you will develop your understanding of how agents act to maximize their utility, while interacting within a changing environment. The methods you will apply include decision tree learning for classification, Markov decision processes and reinforcement learning, and adversarial, game tree search. Readings The material in this problem set corresponds primarily to Lectures 7, 23 and 24. Please review the corresponding lecture notes and any assigned readings, specified in the notes. Note that Lecture 7 covers game-tree search, Lecture 23 covers decision-tree learning, and Lecture 24 covers reinforcement learning and control, based on Markov Decision Processes. Problem 1 –MDPs: Tortoise and Hare The following question is taken from last year’s final. We all know, as the story goes, that the Tortoise beat the Hare to the finish line. The Tortoise was slow, but extremely focused on the finish line, while the Hare was fast, but easily distracted. Although the Tortoise crossed the finish first, who really gained the greatest reward, the Tortoise or the Hare? It’s a matter of perspective. To resolve this age old question, we frame the race as an MDP, solve for the optimal policy, and use this policy to determine once and for all whose path is best, the Tortoise or the Hare. C F 2 3 1 T / 10 H / 18 H / 50 T / 0 T orH / 100 T or H / 0 T or H / 0 We model the race with the above MDP. The race starts at 1, and finishes at F. 2 and 3 State Action Next State Reward 1 T 3 10 1 H 2 18 2 H C 50 2 T 3 0 3 T or H F 100 C T or H C 0 F T or H F 0

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
denote intermediate check points along the race course, while C denotes a Cabbage patch, which is very enticing to the Hare. Actions are T and H. T denotes actions focused towards the finish line, while H denotes an action that grabs the greatest immediate reward. The tortoise’s sequence <T, T> is the shortest path to the finish line. The hare’s sequence <H,H> is the direct path to the cabbage patch, with rewards along the way. <H, T, T> represents a mixed strategy, balancing immediate and long term reward. Part A. Value Function and Policy for Tortoise Discount
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 11/07/2011 for the course AERO 16.410 taught by Professor Brianwilliams during the Fall '05 term at MIT.

### Page1 / 9

ps10 - Massachusetts Institute of Technology 16.410...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online