cmsc320_f2018_lec01.pdf - INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON PREM SAGGAR Today and Wednesday Lecture#1 \u2013 CMSC320 Mondays Wednesdays 2:00pm

cmsc320_f2018_lec01.pdf - INTRODUCTION TO DATA SCIENCE JOHN...

This preview shows page 1 - 15 out of 42 pages.

INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON PREM SAGGAR Lecture #1 – 08/27/2018 CMSC320 Mondays & Wednesdays 2:00pm – 3:15pm Today and Wednesday!
Image of page 1
INTRODUCTION TO ?????????????
Image of page 2
3 Data science is the application of computational and statistical techniques to address or gain [managerial or scientific] insight into some problem in the real world . Zico Kolter Machine Learning Prof, CMU
Image of page 3
4 Drew Conway CEO, Alluvium (analytics company)
Image of page 4
MANY DEFINITIONS Broad : necessarily larger than a single discipline Interdisciplinary : statistics, computer science, operations research, statistical and machine learning, data warehousing, visualization, mathematics, information science, … Insight-focused : grounded in the desire to find insights in data and leverage them to inform decision making 5 Tuomas Carsey, UNC
Image of page 5
THE DATA LIFECYCLE 6 Data collection Data processing Exploratory analysis & Data viz Analysis, hypothesis testing, & ML Insight & Policy Decision
Image of page 6
7 “The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids.” Hal Varian Chief Economist at Google
Image of page 7
THIS COURSE You’ll learn to take data: Process it Visualize it Understand it Communicate it Extract value from it Info: Piazza: piazza.com/umd/fall2018/cmsc320 ELMS: (everyone should be registered automatically) 8 Hal Varian
Image of page 8
PREREQUISITE KNOWLEDGE Aimed at CMSC undergrads – but likely accessible to others with programming experience and mathematical maturity. We do not assume: Experience with Python, pandas, scikit-learn, matplotlib, etc … Deep statistics or any ML knowledge Database or distributed systems knowledge We do assume: You want to be here! 9
Image of page 9
WHO AM I? 10
Image of page 10
WHO IS PREM SAGGAR? (Prem will likely be here on Wednesday.) 11
Image of page 11
WHO ARE YOU? Register on Piazza: piazza.com/umd/fall2018/cmsc320 2 nd -year 12 3 rd -year 4 th -year + STAT400? CMSC422? CMSC424?
Image of page 12
(TENTATIVE) COURSE STRUCTURE First 4 lectures: intro & primers in the Python data science stack Next 6 lectures: data collection & management Best practices, data wrangling, exploratory analysis, ethics, debugging, visualization, etc … Next 9 lectures: statistical modeling & ML Statistical learning, regression, classification, cross- validation, model evaluation, hypothesis testing, etc … Midterm Final 8 lectures: advanced topics Dimensionality reduction, distributed learning, big data, distributed computation Either group presentations or more lectures 13 Ambitious …
Image of page 13
GRADE #1: MINI-PROJECTS Students will complete four mini-project assignments: Case studies meant to mimic what you, a future data scientist, will see in industry. They should be fun J .
Image of page 14
Image of page 15

You've reached the end of your free preview.

Want to read all 42 pages?

  • Spring '17
  • John P. Dickerson

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture