clustering

clustering - Data Mining Originally, data mining was a...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Data Mining Originally, data mining was a statisticians term for overusing data to draw invalid inferences. A famous example- David Rhine, a parapsychologist at Duke in the 1950s tested students for extrasensory perception by asking them to guess 10 cards as either red or black. He found that about 1/1000 of them guessed all 10, and instead of realizing that that is what youd expect from random guessing, declared them to have ESP. When he retested them, he found they did no better than average. His conclusion: telling people they have ESP causes them to lose it! Our definition will be the The extraction of implicit, previously unknown, and potentially useful information from data Some famous quotes about data mining Drowning in Data yet Starving for Knowledge - anonymous Computers have promised us a fountain of wisdom but delivered a flood of data William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot What is NOT data mining? Data Mining, noun: Torturing data until it confesses ... and if you torture it enough, it will confess to anything Jeff Jonas, IBM An Unethical Econometric practice of massaging and manipulating the data to obtain the desired results W.S. Brown Introducing Econometrics Some examples of data mining Patterns of traveler behavior mined to manage the sale of discounted seats on planes, rooms in hotels, etc. The connection between diapers and beer. From the use of data mining it was observed that customers who buy diapers are more likely to by beer than average. Supermarkets then placed beer and diapers nearby, knowing many customers would walk between them. Placing potato chips between diapers and beer increased sales of all three items. Skycat and Sloan Digital Sky Survey - clustering sky objects by their radiation levels in different bands allowed astromomers to distinguish between galaxies, nearby stars,and many other kinds of celestial objects. Comparison of the genotype of people with/without a condition allowed the discovery of a set of genes that together account for many cases of diabetes. This sort of mining will become much more important as the human genome is constructed. Data mining is an interdisciplinary field and researchers in many different areas use data mining techniques. Statistics Mathematics Artificial Intelligence where it is called machine learning . Researchers in clustering algorithms Visualization researchers Databases Stages of Data Mining 1. Data gathering, e.g., data warehousing, web crawling 2. Data cleansing - eliminate errors and/or bogus data, e.g., patient fever = 125 3. Feature extraction - obtaining only the interesting attributes of the data, e.g., date acquired is probably not useful for clustering celestial objects 4. Pattern extraction and discovery - this is the stage that is often thought of4....
View Full Document

Page1 / 58

clustering - Data Mining Originally, data mining was a...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online