jurafsky&martin_3rdEd_17 (1).pdf

A common choice of corpus is to collect databases of

Info icon This preview shows pages 424–426. Sign up to view the full content.

A common choice of corpus is to collect databases of human conversations. These can come from microblogging platforms like Twitter or Sina Weibo ( Æ Z ). Another approach is to use corpora of movie dialog. Once a chatbot has been put into practice, the turns that humans use to respond to the chatbot can be used as additional conversational data for training. Given the corpus and the user’s sentence, IR-based systems can use any retrieval algorithm to choose an appropriate response from the corpus. The two simplest methods are the following: 1. Return the response to the most similar turn: Given user query q and a con- versational corpus C , find the turn t in C that is most similar to q (for example has the highest cosine with q ) and return the following turn, i.e. the human response to t in C : r = response argmax t 2 C q T t || q || t || (28.1) The idea is that we should look for a turn that most resembles the user’s turn, and return the human response to that turn ( Jafarpour et al. 2009 , Leuski and Traum 2011 ). 2. Return the most similar turn : Given user query q and a conversational corpus C , return the turn t in C that is most similar to q (for example has the highest cosine with q ): r = argmax t 2 C q T t || q || t || (28.2) The idea here is to directly match the users query q with turns from C , since a good response will often share words or semantics with the prior turn.
Image of page 424

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

28.1 C HATBOTS 425 In each case, any similarity function can be used, most commonly cosines com- puted either over words (using tf-idf) or over embeddings. Although returning the response to the most similar turn seems like a more in- tuitive algorithm, returning the most similar turn seems to work better in practice, perhaps because selecting the response adds another layer of indirection that can allow for more noise ( Ritter et al. 2011 , Wang et al. 2013 ). The IR-based approach can be extended by using more features than just the words in the q (such as words in prior turns, or information about the user), and using any full IR ranking approach. Commercial implementations of the IR-based approach include Cleverbot (Carpenter, 2017) and Microsoft’s ’XioaIce’ (Little Bing ) system (Microsoft, ) . Instead of just using corpora of conversation, the IR-based approach can be used to draw responses from narrative (non-dialog) text. For example, the pioneering COBOT chatbot (Isbell et al., 2000) generated responses by selecting sentences from a corpus that combined the Unabomber Manifesto by Theodore Kaczynski, articles on alien abduction, the scripts of “The Big Lebowski” and “Planet of the Apes”. Chatbots that want to generate informative turns such as answers to user questions can use texts like Wikipedia to draw on sentences that might contain those answers (Yan et al., 2016) . Sequence to sequence chatbots An alternate way to use a corpus to generate dialog is to think of response generation as a task of transducing from the user’s prior turn to the system’s turn. This is basically the machine learning version of Eliza; machine learning from a corpus to transduce a question to an answer.
Image of page 425
Image of page 426
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern