DSCI4520_TextMining_11

DSCI4520_TextMining_11 - DSCI 4520/5240 DATA MINING DSCI...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Lecture 11 - 1 DSCI 4520/5240 DATA MINING Some slide material taken from: SAS Education DSCI 4520/5240 Lecture 11 Text Mining DSCI 4520/5240 DBDSS (DATA MINING)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Lecture 11 - 2 DSCI 4520/5240 DATA MINING Objectives Introduction to Text Mining Definition Some Text Mining Applications Fundamentals of Information Retrieval Singular Value Decomposition Using SAS Enterprise Miner for Text Mining Case1: The Federalist Papers Case 2: Analysis of Vaccine Adverse Effect Reporting System (VAERS) data
Background image of page 2
Lecture 11 - 3 DSCI 4520/5240 DATA MINING Text Mining: A Definition Text Mining: Application of Information Retrieval and Data Mining techniques that accommodate text as an input variable in knowledge discovery or predictive modeling
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Lecture 11 - 4 DSCI 4520/5240 DATA MINING Another View of Text Mining Text Text A A Miracle Miracle Occurs Occurs Numbers Numbers
Background image of page 4
Lecture 11 - 5 DSCI 4520/5240 DATA MINING Difficulties in Text Quantification Abstract Concepts are difficult to quantify Synonymy: Multiple synonyms create multiple text representations of the same concept Polysemy: One term can be related to multiple concepts, depending on the context The curse of dimensionality: Text representations result in high dimensionality (tens of thousands of dimensions)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Lecture 11 - 6 DSCI 4520/5240 DATA MINING Syntax and Semantic Issues “I made her duck.” (Jurafsky and Martin 2000, page 4.) I cooked waterfowl for her. I cooked waterfowl belonging to her. I created the (plaster) duck that she owns. I caused her to quickly lower her head or body. I waved my magic wand and turned her into undifferentiated waterfowl.
Background image of page 6
Lecture 11 - 7 DSCI 4520/5240 DATA MINING The interest in Text Mining (DM=Data Mining, IR=Information Retrieval, TM=Text Mining) Nov 4, 2004 Job Search Results : Monster.com: DM = 1399 jobs, IR = 791 jobs, TM = 35 jobs Jobs.com: DM = 1796 jobs, IR = 847 jobs, TM = 47 jobs Apr 22, 2008 Job Search Results: CareerBuilder.com: DM = 1051 jobs, IR = 601 jobs, TM = 126 jobs Nov 12, 2008 Job Search Results: CareerBuilder.com: DM = 1373 jobs, IR = 486 jobs, TM = 21 jobs Monster.com: DM = 1214 jobs, IR = 569 jobs, TM = 45 jobs Jobs.com: DM = >1,000 jobs, IR = 648 jobs, TM = 84 jobs Nov 11, 2009 Job Search Results: CareerBuilder.com: DM = 649 jobs, IR = 360 jobs, TM = 31 jobs Monster.com/Jobs.com: DM = 759 jobs, IR = 414 jobs, TM = 44 jobs
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Lecture 11 - 8 DSCI 4520/5240 DATA MINING The interest in Text Mining (DM=Data Mining, IR=Information Retrieval, TM=Text Mining) Nov 3, 2010 Job Search Results: CareerBuilder.com: DM = 952 jobs, IR = 447 jobs, TM = 30 jobs Monster.com/Jobs.com: DM > 1000 jobs, IR > 1000 jobs, TM = 38 jobs
Background image of page 8
Lecture 11 - 9 DSCI 4520/5240 DATA MINING Application: Automotive Early Warning System Wallace and Cermack (2004) describe the use of text mining for warranty analysis related to the TREAD act.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Lecture 11 - 10 DSCI 4520/5240 DATA MINING Application: Medical
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/10/2011 for the course DSCI 4520 taught by Professor Staff during the Spring '08 term at North Texas.

Page1 / 67

DSCI4520_TextMining_11 - DSCI 4520/5240 DATA MINING DSCI...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online