MIS 4350
Introduction to BI and Data Mining
Homework 2 knearest-neighbor
Purpose
Students will apply knowledge of decision tree classification concepts to address practical exercises.
Assigned: September 10th, 2014
Due: September 16th, 2014
Deliverables
C
Spring, 2014
MIS 4350
Intro to BI and Data Mining
Homework 3 Logistic
Classification
Purpose
Through completing homework 3, students will apply knowledge of classification concepts to address practical exercises.
Assigned: September 17th, 2014 in class
Du
MIS 4350
Data Mining Introduction
MIS 4350 Intro to BI and Data Mining
Definitions
CIO.com: Business intelligence, or BI, is an umbrella term that
refers to a variety of software applications used to analyze an
organizations raw data. BI as a discipline
MIS 4350
Classification: Decision Trees
MIS 4350 Introduction to BI and
Data Mining
The Model: Trees (CART)
Classification Tree
Response/DV is
categorical/qualitative
root node (based upon attribute value)
Group Work
Regression Tree
None
branch
All
Resp
MIS 4350
Classification Introduction
MIS 4350 Introduction to BI and
Data Mining
Motivation
MIT Sloan Sports Analytics Conference
Going for Three: Predicting the Likelihood of Field Goal Success
with Logistic Regression
"The field goal is a critical scori
MIS 4350
Classification: Logistic Regression
MIS 4350 Intro to BI and Data Mining
Motivation
Computers predict national basketball champion
(link)
"During the season, [Georgia Tech's Logistic Regression/Markov Chain
(LRMC) college basketball ranking syste
MIS 4350
Data Mining Introduction
MIS 4350 Introduction to BI and
Data Mining
Discovery / Analytics
I want to sell more lemonade. How might I do that?
I have a theory based upon
(experience, intuition,
education) that the hotter it is,
the more lemonade I
MIS 4350
Introduction to R
MIS 4350 Introduction to BI and Data
Mining
R
R: Open Source Statistical Package
Download R at cran.r-project.org/ (versions for Windows, Mac, and Linux)
Resources
UCLA (link)
CRAN (link)
?command
Primary drawback
Handling l
MIS 4350
Classification: k-nearest-neighbor
MIS 4350 Introduction to BI and
Data Mining
Classification Approaches
Eager Learning
Lazy Learning
Data Model
Classification
Data Classification
(decision tree, logistic
regression, svm)
(knn)
2
The Model: knn
MIS 4350
Decision Trees & R
MIS 4350 Introduction to BI and
Data Mining
Exploring the Iris data set
> install.packages('tree') # see also the rpart package
> library(tree) # from package tree
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "
MIS 4350
knn & R
MIS 4350 Introduction to BI and
Data Mining
Back to the Iris Data Set
> install.packages('tree')
> library(tree) # from package tree
> data(iris)
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
> head
MIS 6324
Logistic Regression & R
MIS 6324 BI Software & Techniques
Spring, 2014
Logistic Regression & the Stock Market
Stock Market Data: Smarket (in ISLR library)
> install.packages("ISLR")
- Please select a CRAN mirror for use in this session -trying UR