CHATBOT_Architecture_Design_and_Developm.pdf - CHATBOT...

This preview shows page 1 - 4 out of 46 pages.

CHATBOT: Architecture, Design, & Development By Jack Cahn Thesis Advisor: Dr. Boon Thau Loo Engineering Advisor: Dr. Jean Gallier Senior Thesis (EAS499) University of Pennsylvania School of Engineering and Applied Science Department of Computer and Information Science April 26, 2017
2 Table of Contents 1. Introduction ......................................................................................................................................................... 3 2. Chatbot Overview ............................................................................................................................................... 4 2.1. Background & History .............................................................................................................................. 4 2.1.1. What is a Chatbot? .............................................................................................................................. 4 2.1.2. Chatbot History ................................................................................................................................... 4 2.2. Evaluation ................................................................................................................................................... 5 2.2.1. Evaluation Perspectives ....................................................................................................................... 5 2.2.2. PARADISE Framework ...................................................................................................................... 5 2.2.3. Other Evaluation Methods ................................................................................................................... 6 3. Architecture & Design ........................................................................................................................................ 7 3.1. Speech-to-Text Conversion ........................................................................................................................ 7 3.1.1. Large Vocabulary Speech Recognition ............................................................................................... 7 3.1.2. ASR Process Model ............................................................................................................................. 8 3.1.3. Restricted Boltzmann Machine (RBM) Implementation ..................................................................... 9 3.2. Natural Language Processing ................................................................................................................... 9 3.2.1. Dialogue Act (DA) Recognition ........................................................................................................ 10 3.2.2. Bayesian Approaches to DA Models ................................................................................................. 11 3.2.3. Non-Bayesian Approaches to DA Models ........................................................................................ 11 3.2.4. Intent Identification ........................................................................................................................... 12 3.2.5. Information Extraction ...................................................................................................................... 13 3.2.6. Statistical Methods for Information Extraction ................................................................................. 14 3.3. Response Generation ............................................................................................................................... 16 3.3.1. Rule-Based Models ........................................................................................................................... 17 3.3.2. Information Retrieval (IR)-Based Models ......................................................................................... 18 3.3.3. Statistical Machine Translation Generative Models .......................................................................... 20 3.3.4. Sequence to Sequence (Seq2Seq) Model .......................................................................................... 22 3.3.5. Reinforcement Learning with Seq2Seq bots ..................................................................................... 23 3.4. Knowledge Base Creation ........................................................................................................................ 25 3.4.1. Human-Annotated Corpora ............................................................................................................... 25 3.4.2. Discussion Forums ............................................................................................................................ 26 3.4.3. Email Conversations .......................................................................................................................... 26 3.5. Dialogue Management ............................................................................................................................. 27 3.5.1. Communication Strategies ................................................................................................................. 27 3.5.2. Language Tricks ................................................................................................................................ 28 3.5.3. Dialogue Design Principles ............................................................................................................... 29 3.5.4. Human Imitation Strategies ............................................................................................................... 29 3.5.5. Personality Development ................................................................................................................... 30 3.6. Text to Speech ........................................................................................................................................... 31 3.6.1. Text Analysis ..................................................................................................................................... 31 3.6.2. Waveform Synthesis .......................................................................................................................... 32 4. Applications & Development ........................................................................................................................... 33 4.1. IBM Watson Case Study ......................................................................................................................... 33 4.1.1. Natural Language Processing ............................................................................................................ 33 4.1.2. Response Generation ......................................................................................................................... 33 4.1.3. Knowledge Base ................................................................................................................................ 34 4.2. Security Considerations ........................................................................................................................... 35 4.2.1. Security Flaws in Chatbot Platforms ................................................................................................. 35 4.2.2. Malicious Chatbots ............................................................................................................................ 35 4.3. Applications .............................................................................................................................................. 35 4.3.1. Virtual Personal Assistants (VPAs) ................................................................................................... 36 4.3.2. Consumer Domain-Specific Bots ...................................................................................................... 36 5. References .......................................................................................................................................................... 38