Topic1-DMIntro

Topic1-DMIntro - Data Mining - Columbia University Topic 1:...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Data Mining - Columbia University Topic 1: Introduction to Data Mining Instructor: Chris Volinsky 1 Intro Who am I? Who are you? Data Mining - Columbia University 2 Data Mining - Columbia University Class Schedule Sept 8 December 8 No class Election Day or Thanksgiving Syllabus: www.research.att.com/~volinsky/DataMining/Columbia2011/Columbia2011.html My email: volinsky@research.att.com My phone: 973-360-8644 My office hours: by appointment before or after class 3 Class Assessment 30% HW Due every two weeks 1 st HW due next Thursday September 15 No late HW accepted 30% Tests Midterm and Final 40% Data Mining Project Proposal due in October Project due Tuesday Dec 13 Data Mining - Columbia University 4 Data Mining - Columbia University Course Objectives Direct Objectives: To learn data mining techniques To see their use in real-world/research applications To understand limitations of standard statistical techniques in data mining applications To get an understanding of the methodological principles behind data mining To be able to read about data mining in the popular press with a critical eye To implement & use data mining models using statistical software 5 Data Mining - Columbia University Data Analysis Project The goal of data mining is to find interesting patterns in data. You will be required to: Define a scientific question of interest Collect a data set n>1000 (probably online) Prepare the data set properly Analyze the data using appropriate models Write a 10-20 page report on your analysis (graphics included) Project proposals (1/2 -1 page) will be due in early October. Volunteers to present projects in class for extra credit. Finished reports will be due December 13. 6 Data Mining - Columbia University Data Mining Software 7 Software Can use any software you like must know how to input, manipulate, graph, and analyze data. Preferred: R Also: SAS, Weka, SPSS, Systat, Enterprise Miner, JMP, Minitab, Matlab, SQL Server Maybe not: Excel, C What is R? Open source statistical software grown out of S/Splus www.r-project.org Many user-contributed packages at CRAN (cran.r-project.org) Active, helpful user community (help lists, bulletin boards, etc) R Tutorials available online (see class website and CRAN) Great graphics (with a bit of a learning curve) Other useful tools: Perl/Python, AWK, Shell scripts Data Mining - Columbia University Resources Data mining is a new field and as such, does not have authoritative texts (yet)....
View Full Document

Page1 / 59

Topic1-DMIntro - Data Mining - Columbia University Topic 1:...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online