INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON PREM SAGGAR Lecture #1 – 08/27/2018 CMSC320 Mondays & Wednesdays 2:00pm – 3:15pm Today and Wednesday!
INTRODUCTION TO ?????????????
3 Data science is the application of computational and statistical techniques to address or gain [managerial or scientific] insight into some problem in the real world . Zico Kolter Machine Learning Prof, CMU
4 Drew Conway CEO, Alluvium (analytics company)
MANY DEFINITIONS Broad : necessarily larger than a single discipline Interdisciplinary : statistics, computer science, operations research, statistical and machine learning, data warehousing, visualization, mathematics, information science, … Insight-focused : grounded in the desire to find insights in data and leverage them to inform decision making 5 Tuomas Carsey, UNC
THE DATA LIFECYCLE 6 Data collection Data processing Exploratory analysis & Data viz Analysis, hypothesis testing, & ML Insight & Policy Decision
7 “The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids.” Hal Varian Chief Economist at Google
THIS COURSE You’ll learn to take data: • Process it • Visualize it • Understand it • Communicate it • Extract value from it Info: Piazza: piazza.com/umd/fall2018/cmsc320 ELMS: (everyone should be registered automatically) 8 Hal Varian
PREREQUISITE KNOWLEDGE Aimed at CMSC undergrads – but likely accessible to others with programming experience and mathematical maturity. We do not assume: • Experience with Python, pandas, scikit-learn, matplotlib, etc … • Deep statistics or any ML knowledge • Database or distributed systems knowledge We do assume: • You want to be here! 9
WHO AM I? 10
WHO IS PREM SAGGAR? (Prem will likely be here on Wednesday.) 11
WHO ARE YOU? Register on Piazza: piazza.com/umd/fall2018/cmsc320 2 nd -year 12 3 rd -year 4 th -year + STAT400? CMSC422? CMSC424?
(TENTATIVE) COURSE STRUCTURE First 4 lectures: intro & primers in the Python data science stack Next 6 lectures: data collection & management • Best practices, data wrangling, exploratory analysis, ethics, debugging, visualization, etc … Next 9 lectures: statistical modeling & ML • Statistical learning, regression, classification, cross- validation, model evaluation, hypothesis testing, etc … Midterm Final 8 lectures: advanced topics • Dimensionality reduction, distributed learning, big data, distributed computation • Either group presentations or more lectures 13 Ambitious …
GRADE #1: MINI-PROJECTS Students will complete four mini-project assignments: • Case studies meant to mimic what you, a future data scientist, will see in industry. They should be fun J .
You've reached the end of your free preview.
Want to read all 42 pages?
- Spring '17
- John P. Dickerson