cmsc320_s2017_lec26.pdf - INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture#26 CMSC320 Tuesdays Thursdays 3:30pm 4:45pm ANNOUNCEMENTS HW4 is due on

cmsc320_s2017_lec26.pdf - INTRODUCTION TO DATA SCIENCE JOHN...

This preview shows page 1 - 11 out of 37 pages.

INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #26 – 5/2/2017 CMSC320 Tuesdays & Thursdays 3:30pm – 4:45pm
Image of page 1
ANNOUNCEMENTS HW4 is due on Thursday (5/4) ~1/3 turned in already! Fill out the group assignment Google Drive document: 0CEc-El9e_L3xDoqn7mECAJHnZA/edit?usp=sharing Tuesday, May 9 th will be a midterm & general wrap-up questions lecture day by Denis Peskov (Worked as a data scientist before coming to UMD.) 2
Image of page 2
EXAM GRADES N: 83 Min: 15 Max: 99 Mean: 69.67 Median: 73 Stdev: 17.8 3 0 2 4 6 8 10 12 14 CMSC320 Raw Midterm Scores
Image of page 3
FINAL TUTORIAL Deliverable: URL of your own GitHub Pages site hosting an .ipynb/.html export of your final tutorial – make a GitHub account, too! The project itself: ~1500+ words of Markdown prose ~150+ lines of Python Should be viewable as a static webpage – that is, if I (or anyone else) opens the link up, everything should render and I shouldn’t have to run any cells to generate output 4
Image of page 4
FINAL TUTORIAL RUBRIC The TAs and I will grade on a scale of 1-10: Motivation: Does the tutorial make the reader believe the topic is important (a) in general and (b) with respect to data science? Understanding: After reading the tutorial, does the reader understand the topic? Further resources: Does the tutorial “call out” to other resources that would help the reader understand basic concepts, deep dive, related work, etc? Prose: Does the prose in the Markdown portion of the .ipynb add to the reader’s understanding of the tutorial? Code: Does the code help solidify understanding, is it well documented, and does it include helpful examples? Subjective Evaluation: If somebody linked to this tutorial from Hacker News, would people actually read the whole thing? 5 Thanks to: Zico Kolter
Image of page 5
TODAY’S LECTURE Data collection Data processing Exploratory analysis & Data viz Analysis, hypothesis testing, & ML Insight & Policy Decision 6
Image of page 6
DATA SCIENCE IN INDUSTRY 7
Image of page 7
WHAT IS A DATA SCIENTIST? Many types of “data scientists” in industry … Business analysts, renamed “… someone who analyzes an organization or business domain (real or hypothetical) and documents its business or processes or systems, assessing the business model or its integration with technology.” – Wikipedia Statisticians Machine learning engineer Backend tools developer 8 Thanks to: Zico Kolter
Image of page 8
KEY DIFFERENCES Classical statistics vs machine learning approaches (Two are nearly mixed in most job calls you will see.) Developing data science tools vs. doing data analysis Working on a core business product vs more nebulous “identification of value” for the firm 9
Image of page 9
FINDING A JOB Make a personal website. Free hosting options: GitHub Pages, Google Sites Pay for your own URL (but not the hosting).
Image of page 10
Image of page 11

You've reached the end of your free preview.

Want to read all 37 pages?

  • Spring '17
  • John P. Dickerson

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture