cmsc320_f2018_lec18+19.pdf - INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture#18 \u2013 CMSC320 Mondays Wednesdays 2:00pm \u2013 3:15pm ANNOUNCEMENTS

cmsc320_f2018_lec18+19.pdf - INTRODUCTION TO DATA SCIENCE...

This preview shows page 1 - 11 out of 81 pages.

INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #18 – 10/29/2018 CMSC320 Mondays & Wednesdays 2:00pm – 3:15pm
Image of page 1
ANNOUNCEMENTS 2 Mini-Project #2 grades will be out by Thursday night! Mini-Project #3 is out! It is linked to from ELMS; it is also be available at: Deliverable is a .ipynb file submitted to ELMS Due November 19 th Please label your ipynb file something like <lastname>_<firstname>_project3.ipynb
Image of page 2
MIDTERMS Not graded yet! If you still need to take a midterm exam, please please please please please tell me. I know of exactly four of you who do. 3
Image of page 3
THIS LECTURE Data collection Data processing Exploratory analysis & Data viz Analysis, hypothesis testing, & ML Insight & Policy Decision 4
Image of page 4
THIS LECTURE: Words words words! Free text and natural language processing in data science Bag of words and TF-IDF N-Grams and language models Sentiment mining Thanks to: Zico Kolter (CMU) & Marine Carpuat’s 723 (UMD) 5
Image of page 5
PRECURSOR TO NATURAL LANGUAGE PROCESSING
Image of page 6
PRECURSOR TO NATURAL LANGUAGE PROCESSING Turing’s Imitation Game [1950]: Person A and Person B go into separate rooms Guests send questions in, read questions that come out – but they are not told who sent the answers Person A (B) wants to convince group that she is Person B (A)We now ask the question, "What will happen when a machine takes the part of [Person] A in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between [two humans]? These questions replace our original, "Can machines think?" 7
Image of page 7
PRECURSOR TO NATURAL LANGUAGE PROCESSING Mechanical translation started in the 1930s Largely based on dictionary lookups Georgetown-IBM Experiment: Translated 60 Russian sentences to English Fairly basic system behind the scenes Highly publicized, system ended up spectacularly failing Funding dried up; not much research in “mechanical translation” until the 1980s … 8
Image of page 8
STATISTICAL NATURAL LANGUAGE PROCESSING Pre-1980s: primarily based on sets of hand-tuned rules Post-1980s: introduction of machine learning to NLP Initially, decision trees learned what-if rules automatically Then, hidden Markov models (HMMs) were used for part of speech (POS) tagging Explosion of statistical models for language Recent work focuses on purely unsupervised or semi- supervised learning of models We’ll cover some of this in the machine learning lectures! 9
Image of page 9
NLP IN DATA SCIENCE In Mini-Project #1, you used requests and BeautifulSoup
Image of page 10
Image of page 11

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture