CSE291

CSE291 - Predictive analytics and data mining Charles Elkan...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Predictive analytics and data mining Charles Elkan elkan@cs.ucsd.edu May 31, 2011 1 Contents Contents 2 1 Introduction 5 1.1 Limitations of predictive analytics . . . . . . . . . . . . . . . . . . 6 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Predictive analytics in general 11 2.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Data cleaning and recoding . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Interpreting coefficients of a linear model . . . . . . . . . . . . . . 16 2.5 Evaluating performance . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Introduction to Rapidminer 25 3.1 Standardization of features . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Example of a Rapidminer process . . . . . . . . . . . . . . . . . . 26 3.3 Other notes on Rapidminer . . . . . . . . . . . . . . . . . . . . . . 29 4 Support vector machines 31 4.1 Loss functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Linear soft-margin SVMs . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 Dual formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5 Nonlinear kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.6 Selecting the best SVM settings . . . . . . . . . . . . . . . . . . . 37 5 Doing valid experiments 45 5.1 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Nested cross-validation . . . . . . . . . . . . . . . . . . . . . . . . 47 2 CONTENTS 3 6 Classification with a rare class 53 6.1 Thresholds and lift . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Ranking examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.3 Training to overcome imbalance . . . . . . . . . . . . . . . . . . . 58 6.4 Conditional probabilities . . . . . . . . . . . . . . . . . . . . . . . 59 6.5 Isotonic regression . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.6 Univariate logistic regression . . . . . . . . . . . . . . . . . . . . . 60 6.7 Pitfalls of link prediction . . . . . . . . . . . . . . . . . . . . . . . 60 7 Logistic regression 69 8 Making optimal decisions 73 8.1 Predictions, decisions, and costs . . . . . . . . . . . . . . . . . . . 73 8.2 Cost matrix properties . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.3 The logic of costs . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 8.4 Making optimal decisions . . . . . . . . . . . . . . . . . . . . . . . 76 8.5 Limitations of cost-based analysis . . . . . . . . . . . . . . . . . . 78 8.6 Rules of thumb for evaluating data mining campaigns . . . . . . . . 78 8.7 Evaluating success . . . . . . . . . . . . . . . . . . . . . . . . . . 80 9 Learning classifiers despite missing labels 87 9.1 The standard scenario for learning a classifier . . . . . . . . . . . . 87 9.2 Sample selection bias in general . . . . . . . . . . . . . . . . . . .Sample selection bias in general ....
View Full Document

This note was uploaded on 08/31/2011 for the course CSE 291 taught by Professor Staff during the Winter '08 term at UCSD.

Page1 / 165

CSE291 - Predictive analytics and data mining Charles Elkan...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online