jurafsky&martin_3rdEd_17 (1).pdf

# To understand each of these components we need to

This preview shows pages 454–456. Sign up to view the full content.

To understand each of these components, we need to look at a tutorial example in which the state space is extremely reduced. Let’s look at a trivial pedagogical frame-and-slot example from Levin et al. (2000) , a “Day-and-Month” dialog system whose goal is to get correct values of day and month for a two-slot frame through the shortest possible interaction with the user. In principle, a state of an MDP could include any possible information about the dialog, such as the complete dialog history so far. Using such a rich model of state would make the number of possible states extraordinarily large. So a model of state is usually chosen that encodes a much more limited set of information, such as the values of the slots in the current frame, the most recent question asked to the user, the user’s most recent answer, the ASR confidence, and so on. For the Day-and-Month example, let’s represent the state of the system as the values of the two slots day and month . There are 411 states (366 states with a day and month (counting leap year), 12 states with a month but no day ( d = 0, m = 1 , 2 ,..., 12), 31 states with a day but no month ( m = 0, d = 1 , 2 ,..., 31), and a special initial state s i and final state s f . Actions of an MDP dialog system might include generating particular speech acts, or performing a database query to find out information. For the Day-and-Month example, Levin et al. (2000) propose the following actions: a d : a question asking for the day a m : a question asking for the month a d m : a question asking for both the day and the month a f : a final action submitting the form and terminating the dialog Since the goal of the system is to get the correct answer with the shortest inter- action, one possible reward function for the system would integrate three terms: R = - ( w i n i + w e n e + w f n f ) (29.8) The term n i is the number of interactions with the user, n e is the number of errors, n f is the number of slots that are filled (0, 1, or 2), and the w s are weights. Finally, a dialog policy p specifies which actions to apply in which state. Con- sider two possible policies: (1) asking for day and month separately, and (2) asking for them together. These might generate the two dialogs shown in Fig. 29.8 .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
29.6 A DVANCED : M ARKOV D ECISION P ROCESSES 455 d=0 m=0 d=D m=0 d=D m=M d=-1 m=-1 d=D m=M d=-1 m=-1 d=0 m=0 Which day? Which month? What date? Goodbye. Goodbye. Policy 1 (directive) Policy 2 (open) c 1 = -3w i + 2p d w e c 2 = -2w i + 2p o w e Figure 29.8 Two policies for getting a month and a day. After Levin et al. (2000) . In policy 1, the action specified for the no-date/no-month state is to ask for a day, and the action specified for any of the 31 states where we have a day but not a month is to ask for a month. In policy 2, the action specified for the no-date/no-month state is to ask an open-ended question ( Which date ) to get both a day and a month. The two policies have different advantages; an open prompt can lead to shorter dialogs but is likely to cause more errors, whereas a directive prompt is slower but less error-prone. Thus, the optimal policy depends on the values of the weights
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern