l24_dialog_manag

# l24_dialog_manag - Dialogue as a Decision Making Process...

This preview shows pages 1–4. Sign up to view the full content.

Nicholas Roy Dialogue as a Decision Making Process Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty Minerva Pearl Predicted Health Care Needs By 2008, need 450,000 additional nurses: Monitoring and walking assistance 30 % of adults 65 years and older have fallen this year Alexander 2001 Cost of preventable falls: \$32 Billion US/year Intelligent reminding Cost of medication non-compliance: Dunbar-Jacobs 2000 \$1 Billion US/year Spoken Dialogue Management We want. .. Natural dialogue. .. With untrained (and untrainable) users. .. In an uncontrolled environment. .. Across many unrelated domains Cost of errors. .. Medication is not taken, or taken incorrectly Robot behaves inappropriately User becomes frustrated, robot is ignored, and becomes useless How to generate such a policy? 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Perception and Control Control World state Probabilistic Perception Markov Decision Processes A Markov Decision Process is given formally by the following: a set of states S={s 1 , s 2 , . .., s n } a set of actions A={a 1 , a 2 , . .., a m } a set of transition probabilities T(s i , a, s j ) = p(s j |a, s i ) a set of rewards R: S ×A? ´ a discount factor g [0, 1] an initial state s 0 ˛ S Bellman's equation (Bellman, 1957) computes the expected reward for each state recursively, and determines the policy that maximises the expected, discounted reward The MDP in Dialogue Management State: Represents desire of user e.g. want_tv , want_meds Assume utterances from speech recogniser give state e.g. I want to take my pills now. Actions are: robot motion, speech acts Reward: maximised for satisfying user task Probabilistic Methods for Dialogue Management Markov Decision Processes model action uncertainty (Levin et. al, 1998, Goddeau & Pineau, 2000) Many techniques for learning optimal policies, especially reinforcement learning (Singh et al. 1999, Litman et al. 2000, Walker 2000) The POMDP in Dialogue Management State: Represents desire of user e.g. want_tv , want_meds This state is unobservable to the dialogue system Observations: Utterances from speech recogniser e.g. .I want to take my pills now. The system must infer the user's state from the possibly noisy or ambiguous observations Where do the emission probabilities come from? At planning time, from a prior model At run time, from the speech recognition engine Markov Decision Processes Model the world as different states the system can be in e.g. current state of completion of a form Each action moves to some new state with probability p(i; j) Observation from user determines posterior state 2
Markov Decision Processes Optimal policy maximizes expected future (discounted) reward

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 15

l24_dialog_manag - Dialogue as a Decision Making Process...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online