Topic1-DMIntro

Topic1-DMIntro - Data Mining Columbia University Topic 1...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Data Mining - Columbia University Topic 1: Introduction to Data Mining Instructor: Chris Volinsky 1 Intro • Who am I? • Who are you? Data Mining - Columbia University 2 Data Mining - Columbia University Class Schedule • Sept 8 – December 8 • No class Election Day or Thanksgiving • Syllabus: www.research.att.com/~volinsky/DataMining/Columbia2011/Columbia2011.html My email: [email protected] My phone: 973-360-8644 My office hours: by appointment before or after class 3 Class Assessment • 30% HW – Due every two weeks – 1 st HW due next Thursday September 15 – No late HW accepted • 30% Tests – Midterm and Final • 40% Data Mining Project – Proposal due in October – Project due Tuesday Dec 13 Data Mining - Columbia University 4 Data Mining - Columbia University Course Objectives • Direct Objectives: – To learn data mining techniques – To see their use in real-world/research applications – To understand limitations of standard statistical techniques in data mining applications – To get an understanding of the methodological principles behind data mining – To be able to read about data mining in the popular press with a critical eye – To implement & use data mining models using statistical software 5 Data Mining - Columbia University Data Analysis Project • The goal of data mining is to find interesting patterns in data. You will be required to: – Define a scientific question of interest – Collect a data set n>1000 (probably online) – Prepare the data set properly – Analyze the data using appropriate models – Write a 10-20 page report on your analysis (graphics included) • Project proposals (1/2 -1 page) will be due in early October. • “Volunteers” to present projects in class for extra credit. • Finished reports will be due December 13. 6 Data Mining - Columbia University Data Mining Software 7 • Software – Can use any software you like – must know how to input, manipulate, graph, and analyze data. – Preferred: R – Also: SAS, Weka, SPSS, Systat, Enterprise Miner, JMP, Minitab, Matlab, SQL Server – Maybe not: Excel, C • What is R? – Open source statistical software grown out of S/Splus – www.r-project.org – Many user-contributed packages at CRAN (cran.r-project.org) – Active, helpful user community (help lists, bulletin boards, etc) – R Tutorials available online (see class website and CRAN) – Great graphics (with a bit of a learning curve) • Other useful tools: Perl/Python, AWK, Shell scripts Data Mining - Columbia University Resources • Data mining is a new field and as such, does not have authoritative texts (yet)....
View Full Document

This note was uploaded on 02/28/2012 for the course ELEN E4815 taught by Professor I during the Spring '12 term at Columbia.

Page1 / 59

Topic1-DMIntro - Data Mining Columbia University Topic 1...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online