INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #26 – 5/2/2017 CMSC320 Tuesdays & Thursdays 3:30pm – 4:45pm
ANNOUNCEMENTS HW4 is due on Thursday (5/4) • ~1/3 turned in already! • Fill out the group assignment Google Drive document: 0CEc-El9e_L3xDoqn7mECAJHnZA/edit?usp=sharing Tuesday, May 9 th will be a midterm & general wrap-up questions lecture day by Denis Peskov • (Worked as a data scientist before coming to UMD.) 2
EXAM GRADES N: 83 Min: 15 Max: 99 Mean: 69.67 Median: 73 Stdev: 17.8 3 0 2 4 6 8 10 12 14 CMSC320 Raw Midterm Scores
FINAL TUTORIAL Deliverable: URL of your own GitHub Pages site hosting an .ipynb/.html export of your final tutorial • – make a GitHub account, too! • The project itself: • ~1500+ words of Markdown prose • ~150+ lines of Python • Should be viewable as a static webpage – that is, if I (or anyone else) opens the link up, everything should render and I shouldn’t have to run any cells to generate output 4
FINAL TUTORIAL RUBRIC The TAs and I will grade on a scale of 1-10: Motivation: Does the tutorial make the reader believe the topic is important (a) in general and (b) with respect to data science? Understanding: After reading the tutorial, does the reader understand the topic? Further resources: Does the tutorial “call out” to other resources that would help the reader understand basic concepts, deep dive, related work, etc? Prose: Does the prose in the Markdown portion of the .ipynb add to the reader’s understanding of the tutorial? Code: Does the code help solidify understanding, is it well documented, and does it include helpful examples? Subjective Evaluation: If somebody linked to this tutorial from Hacker News, would people actually read the whole thing? 5 Thanks to: Zico Kolter
TODAY’S LECTURE Data collection Data processing Exploratory analysis & Data viz Analysis, hypothesis testing, & ML Insight & Policy Decision 6
DATA SCIENCE IN INDUSTRY 7
WHAT IS A DATA SCIENTIST? Many types of “data scientists” in industry … • Business analysts, renamed • “… someone who analyzes an organization or business domain (real or hypothetical) and documents its business or processes or systems, assessing the business model or its integration with technology.” – Wikipedia • Statisticians • Machine learning engineer • Backend tools developer 8 Thanks to: Zico Kolter
KEY DIFFERENCES Classical statistics vs machine learning approaches • (Two are nearly mixed in most job calls you will see.) Developing data science tools vs. doing data analysis Working on a core business product vs more nebulous “identification of value” for the firm 9
FINDING A JOB Make a personal website. • Free hosting options: GitHub Pages, Google Sites • Pay for your own URL (but not the hosting).
You've reached the end of your free preview.
Want to read all 37 pages?
- Spring '17
- John P. Dickerson