clustering

clustering - Data Mining Originally data mining was a...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Data Mining Originally, data mining was a statistician’s term for overusing data to draw invalid inferences. A famous example- David Rhine, a parapsychologist at Duke in the 1950’s tested students for extrasensory perception by asking them to guess 10 cards as either red or black. He found that about 1/1000 of them guessed all 10, and instead of realizing that that is what you’d expect from random guessing, declared them to have ESP. When he retested them, he found they did no better than average. His conclusion: telling people they have ESP causes them to lose it! Our definition will be the “The extraction of implicit, previously unknown, and potentially useful information from data” Some famous quotes about data mining “Drowning in Data yet Starving for Knowledge” - anonymous “Computers have promised us a fountain of wisdom but delivered a flood of data” William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?” T. S. Eliot What is NOT data mining? Data Mining, noun: “Torturing data until it confesses ... and if you torture it enough, it will confess to anything” Jeff Jonas, IBM ”An Unethical Econometric practice of massaging and manipulating the data to obtain the desired results” W.S. Brown “Introducing Econometrics” Some examples of data mining • Patterns of traveler behavior mined to manage the sale of discounted seats on planes, rooms in hotels, etc. • The connection between diapers and beer. From the use of data mining it was observed that customers who buy diapers are more likely to by beer than average. Supermarkets then placed beer and diapers nearby, knowing many customers would walk between them. Placing potato chips between diapers and beer increased sales of all three items. • Skycat and Sloan Digital Sky Survey - clustering sky objects by their radiation levels in different bands allowed astromomers to distinguish between galaxies, nearby stars,and many other kinds of celestial objects. • Comparison of the genotype of people with/without a condition allowed the discovery of a set of genes that together account for many cases of diabetes. This sort of mining will become much more important as the human genome is constructed. Data mining is an interdisciplinary field and researchers in many different areas use data mining techniques. • Statistics • Mathematics • Artificial Intelligence where it is called machine learning . • Researchers in clustering algorithms • Visualization researchers • Databases Stages of Data Mining 1. Data gathering, e.g., data warehousing, web crawling 2. Data cleansing - eliminate errors and/or bogus data, e.g., patient fever = 125 3. Feature extraction - obtaining only the interesting attributes of the data, e.g., date acquired is probably not useful for clustering celestial objects 4. Pattern extraction and discovery - this is the stage that is often thought of4....
View Full Document

{[ snackBarMessage ]}

Page1 / 58

clustering - Data Mining Originally data mining was a...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online