Stats 202 - Lecture 1

G amazon rainforest amazoncom etc query a web search

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Amazon rainforest,, etc.) -Query a Web search engine for information about “Amazon” Why Mine Data? Scientific Viewpoint Data collected and stored at enormous speeds (GB/hour) ● remote sensors on a satellite ● telescopes scanning the skies ● microarrays generating gene expression data ● scientific simulations generating terabytes of data Traditional techniques infeasible for large data sets Data mining may help scientists ● in classifying and segmenting data ● in hypothesis formation Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused ● Web data, e-commerce ● Purchases at department / grocery stores ● Bank/credit card transactions ● Computers have become more powerful ● Competitive pressure is strong ● Provide better, customized services for an edge In class exercise #1: Give an example of something you did yesterday or today which resulted in data which could potentially be mined to discover useful information. Origins of Data Mining (page 6) Draws ideas from machine learning, AI, pattern recognition and statistics Traditional techniques may be unsuitable due to - enormity of data - high dimensionality of data - heterogeneous, distributed nature of data Statistics AI/Machine Learning Data Mining 2 Types of Data Mining Tasks (page 7) Predictive Methods: Use some variables to predict unknown or fu...
View Full Document

Ask a homework question - tutors are online