PISA Data Analysis Manual
PISA Data
Analysis Manual
SPSS SECOND EDITION
The OECD Programme for International Student Assessment (PISA) surveys collected data on students
performances in reading, mathematics and science, as well as contextual information o

Applied data mining
Lecture 5
Byron C Wallace
Questions on the homework?
which is now due next week
Todays agenda
Wrap up unsupervised learning (evaluation)
Supervised learning wrap-up: a bit of review,
semi-supervised learning, feature selection and
s

Crowd-sourcing
slides derived from our resident expert,
Matt Lease
Crowd-sourcing
Crowd-powered data collection &
applications
Crowdsourcing, Incentives, & Demographics
Mechanical Turk & Other Platforms
Designing for Crowds & Statistical QA
Open Prob

Applied data mining
Lecture 8
Byron C Wallace
Today: human in the loop data mining
Active learning + crowd-sourcing
Today: human in the loop data mining
Active learning + crowd-sourcing
tL
t
a
M
!
e
s
ea
Reducing labeling effort
For most machine learning

Applied data mining
Humans in the Loop
active learning + crowd sourcing
Byron C Wallace
Today: human in the loop data mining
Active learning + crowd-sourcing
Reducing labeling effort
For most machine learning applications, unlabeled
data is readily avail

HPN Health Prize
How We Did It
Market Makers
Heritage Provider Network Health Prize
Round 1 Milestone Prize
How We Did It Team Market Makers
Phil Brierley
David Vogel
Randy Axelrod
Tiberius Data Mining
[email protected]
Voloridge Investment Management
dv

Applied data mining
Lecture 4
Byron C Wallace
Questions on the homework?
Last time: Supervised learning
labeled
data L
learner
unlabeled
data U
predictive
model
This time: Unsupervised learning
Unsupervised learning
Given a bunch of data, make sense of i

Stuff I wanted to cover !
but didnt!
(some slides derived from: Raphael Hoffman, Predrag Radenkovi, Alan Ritter, Moahammed
Kahalilia, Pengcheng Xi & Luke Zettlemoyer)
Stuff I wanted to cover !
but didnt!
!
(sorry if this is sort of hodge-podge!)
(some sli

Applied data mining
Byron C Wallace
Introductions, backgrounds & interests
Course aims
Provide an overview of data mining
methods, their uses and their limits
A practical introduction to using opensource data mining tools (primarily R,
some Weka)
Defini

PISA 2009 Results:
Executive Summary
pisa2009-Ex-book-eng.indd 1
12/1/10 5:53 PM
This work is published on the responsibility of the Secretary-General of the OECD. The opinions
expressed and arguments employed herein do not necessarily reflect the officia

Applied data mining
Unsupervised Learning
Byron C Wallace
Questions on the homework?
A few words on the quiz
The three components
Feature engineering is clutch!
Many of you put algorithms, which is acceptable. But
really features are more important.
Last

Applied Data Mining: Homework 3
Due Thursday, March 5th, 2015
1. Perceptron (50 points) In lecture4.r there is an implementation of the Perceptron algorithm. You will
work with this a bit for this problem. Be sure to upload your code (.r file) in Canvas.

Applied data mining
Supervised Learning
Byron C Wallace
Qs on HW?
Hurray: a Quiz!
(~15 minutes)
Last time: data exploration&
(univariate) regression
Today: supervised learning!
Well briefly go over multivariate
regression
But first, regression and Ansc

Applied data mining
Byron C Wallace
Introductions, backgrounds & interests
Course aims
Provide an overview of data mining
methods, their uses and their limits
A practical introduction to using opensource data mining tools (primarily R,
some Weka)
Defini

Applied data mining
Lecture 3
Byron C Wallace
Hurray: a Quiz!
(~15 minutes)
Last Time
We covered a lot of ground: (basic)
probability all the way to Nave Bayes,
our first classifier
Well pick up where we left off by
discussing other classifiers
Today: S

Applied data mining
Lecture 7
Byron C Wallace
Many of todays slides were derived
from Chris Volinskys data mining
course @ Columbia
Outline
Visualization
One variable
Two variables
More than two variables
Other types of data
Dimension reduction
EDA

COMM 645 HANDOUT NODEXL BASICS
NodeXL: Network Overview, Discovery and Exploration for Excel. Download from nodexl.codeplex.com
Plugin for social media/Facebook import: socialnetimporter.codeplex.com
Plugin for Microsoft Exchange import: exchangespigot.co

DATA SCIENCE IMMERSIVE SYLLABUS
COURSE
OVERVIEW
By the end of this course, students will be able to:
WEEK 1: MATH, &
PROGRAMMING
FUNDAMENTALS
WEEK 2: EDA,
PANDAS & SCIPY
1
Collect, extract, query, clean, and aggregate data for analysis
Perform visual and

Working with network data
COMM 645: Communication Networks
Katherine Ognyanova (Katya)
Fall semester, 2012
Link types
Binary
A
A
Signed
+
B
Link or no link (1 or 0)
B
Positive or negative (+, - or 0)
Valued
A
5
B
Symmetric
A
B
Multiplex
A
Page 2
B
Weig

Applied data mining
Lecture 9
Byron C Wallace
Today: data mining for
the humanities
Some slides from Glenn Roe and the [email protected] program:
http:/digital.humanities.ox.ac.uk/dhoxss/2012/
But first
A quiz!
Culturomics
We construct

Data Science
Boot Camp
Course Overview
Course Outline
Cash registers, websites, patients, employees, customers, students,
machines, warehouses and nearly every other aspect of the
modern business world generate enormous amounts of data.
Hiding within the

Applied data mining
Exploratory Data Analysis
* Many of these slides were derived from Chris Volinskys data mining course @ Columbia
(More on) Nave Bayes and
Generative Models
EDA and Visualization
Exploratory Data Analysis (EDA) and Visualization are
cr

DATA SCIENCE CURRICULUM
Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the rst eight
weeks doing iterative, project-centered skill acquisition. Over the course of five data science projects,

Applied data mining
Lecture 2
Byron C Wallace
homework and/or reading
questions
(aside from Google flu)?
Lets talk Google flu and
data science hype
(class discussion)
Today: a very basic overview/
recap of stat & probability
Not a replacement for a prope

Errata for Statistical Inference, Second Edition (Seventh Printing)
Last Update: May 15, 2007
These errata were complied for the seventh printing. Previous errata have already been incorporated in past printings. We have also been informed of errata
in th

Applied data mining
Lecture 6
Byron C Wallace
Todays agenda
Introduction to Weka
NLP + text mining (mostly entity extraction)
Time for questions on mid-term
What is weka?
Java-based Machine Learning Tool
Implements numerous classifiers
3 modes of op