INTRODUCTION TO
DATA SCIENCE
JOHN P DICKERSON
Lecture #26 – 5/2/2017
CMSC320
Tuesdays & Thursdays
3:30pm – 4:45pm

ANNOUNCEMENTS
HW4 is due on Thursday (5/4)
•
~1/3 turned in already!
•
Fill out the group assignment Google Drive document:
0CEc-El9e_L3xDoqn7mECAJHnZA/edit?usp=sharing
Tuesday, May 9
th
will be a midterm & general wrap-up
questions lecture day by Denis Peskov
•
(Worked as a data scientist before coming to UMD.)
2

EXAM GRADES
N: 83
Min: 15
Max: 99
Mean: 69.67
Median: 73
Stdev: 17.8
3
0
2
4
6
8
10
12
14
CMSC320 Raw Midterm Scores

FINAL TUTORIAL
Deliverable: URL of your own GitHub Pages site hosting an
.ipynb/.html export of your final tutorial
•
– make a GitHub account, too!
•
The project itself:
•
~1500+ words of Markdown prose
•
~150+ lines of Python
•
Should be viewable as a
static webpage
– that is, if I (or
anyone else) opens the link up, everything should render and I
shouldn’t have to run any cells to generate output
4

FINAL TUTORIAL RUBRIC
The TAs and I will grade on a scale of 1-10:
Motivation:
Does the tutorial make the reader believe the topic is
important (a) in general and (b) with respect to data science?
Understanding:
After reading the tutorial, does the reader
understand the topic?
Further resources:
Does the tutorial “call out” to other resources
that would help the reader understand basic concepts, deep dive,
related work, etc?
Prose:
Does the prose in the Markdown portion of the .ipynb add to
the reader’s understanding of the tutorial?
Code:
Does the code help solidify understanding, is it well
documented, and does it include helpful examples?
Subjective Evaluation:
If somebody linked to this tutorial from
Hacker News, would people actually read the whole thing?
5
Thanks to: Zico Kolter

TODAY’S LECTURE
Data
collection
Data
processing
Exploratory
analysis
&
Data viz
Analysis,
hypothesis
testing, &
ML
Insight &
Policy
Decision
6

DATA SCIENCE
IN INDUSTRY
7

WHAT IS A DATA SCIENTIST?
Many types of “data scientists” in industry …
•
Business analysts, renamed
•
“… someone who analyzes an organization or business
domain (real or hypothetical) and documents its business
or processes or systems, assessing the business model or
its integration with technology.”
– Wikipedia
•
Statisticians
•
Machine learning engineer
•
Backend tools developer
8
Thanks to: Zico Kolter

KEY DIFFERENCES
Classical statistics vs machine learning approaches
•
(Two are nearly mixed in most job calls you will see.)
Developing data science tools vs. doing data analysis
Working on a core business product vs more nebulous
“identification of value” for the firm
9

FINDING A JOB
Make a personal website.
•
Free hosting options: GitHub Pages, Google Sites
•
Pay for your own URL (but not the hosting).


You've reached the end of your free preview.
Want to read all 37 pages?
- Spring '17
- John P. Dickerson