8/17/2012
Chapter 9 Classification and
Regression Trees
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Trees and Rules
Goal: Classify or predict an outcome based on a
set of predictors
The output is a set
8/17/2012
Chapter 4 Dimension Reduction
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Exploring the data
Statistical summary of data: common metrics
Average
Median
Minimum
Maximum
Standard deviation
1. Which of the following word(s) is(are) the synonym(s) of the word "predictor"? (Please
select all correct answers)
column name in a
data table
attribute
feature
observation
input variable
4 points
2. The purpose of dimensionality reduction (input varia
8/17/2012
Chapter 11 Neural Nets
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Basic Idea
Combine input information in a complex & flexible
neural net model
Model coefficients are continually tweaked in
Artificial Neural
Networks
Feedforward Neural Network
Predicted output
Y
Output node
Hidden
nodes
bias
1
X1
Input nodes
X2
Weights associated with the connections
are iteratively adjusted during training
to decrease the prediction error.
Training stops
Decision Tree Inductive
Learning
Decision Trees
Given a set of training examples in the form of a set of attribute
values as inputs and classes as outputs, a tree is constructed such
that:
Each non-terminal node is an attribute.
Each arc from a node co
K Nearest Neighbors
Supervised statistical learning
method
A simple classification method used to predict the class of a
categorical variable
Assign a new example to the class that is most common among its k
nearest neighbors in the training samples.
S
Logistic Regression
Logistic Regression
Regression model where the dependent (output) variable is
categorical.
If a binary variable is a function of a continuous input variable ,
logistic regression may be used to estimate the conditional
distribution u
Wage data Example
Set your working directory in RStudio to the directory that contains the file "wagedata.csv"
Read data from "wagedata.csv" into a dataframe called wagedf
wagedf = read.csv("wagedata.csv")
Inspect the stucture of the data frame wagedf
s
8/17/2012
Chapter 17 Smoothing Methods
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Smoothing is data driven
Regression methods assume underlying
unchanging structure (linear, exponential,
polynomial)
8/17/2012
Chapter 16 Regression Based
Forecasting
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Main ideas
Fit linear trend, time as predictor
Modify & use also for non-linear trends
Exponential
Polyn
8/17/2012
Chapter 15 Handling Time Series
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Main ideas
Forecast future values of a time series
Distinction between forecasting (main focus) and
describing/exp
8/17/2012
Chapter 14 Cluster Analysis
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Clustering: The Main Idea
Goal: Form groups (clusters) of similar records
Used for segmenting markets into groups of
sim
8/17/2012
Chapter 13 Association Rules
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
What are Association Rules?
Study of what goes with what
Customers who bought X also bought Y
What symptoms go with
8/17/2012
Chapter 10 Logistic Regression
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Logistic Regression
Extends idea of linear regression to situation
where outcome variable is categorical
Widely use
6/17/2015
Bayesian Classifier
Bayesian Classifier
1
Nave Bayes Classifier
Nave Bayes Classifier:
Nave Bayes classifier is based on the Bayes Theorem by Thomas Bayes (1702- 1761).
Nave Bayes classifiers are a family of simple probabilistic classifiers base
9/28/2015
Chapter 7 K-Nearest-Neighbor
Data Mining for Business Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Characteristics
Data-driven, not model-driven
Makes no assumptions about the data
1
9/28/2015
Basic Idea
For a given re
CLOUD SECURITY
Topic proposal for Term Project
Mohan Krishna Gottiginti
NSU ID N01811363
Nova Southeastern University
College of Engineering and Computing
Professor: Dr. James Smith
ISEC-615 Fundamentals of Security Technologies
I S E C6 1 5 - Ter m P a p
Assignment 3B
Mohan Krishna Gottiginti
NSU ID N01811363
College of Engineering and Computing
Professor: Dr. Junping Sun
MMIS 643 Data Mining
Data Mining-Assignment 3B
Page | 1
1. Record the RMS errors for the training data and the
validation data, and obs
ISEC 615 Fundamentals of Security Technologies
Assignment-1
1. A_ is created by using a secure hash function to generate a hash
value for a message and then encrypting the hash code with a private key.
a keystream
.
b secret key
.
c digital signature
.
d
Assignment 2B
Mohan Krishna Gottiginti
NSU ID N01811363
College of Engineering and Computing
Professor: Dr. Junping Sun
MMIS 643 Data Mining
Data Mining-Assignment 2B
Page | 1
1. What is the best k chosen?
Best K chosen is 11 with normalized data.
2. What
QUESTION 1
1. The _ defines the transport protocol.
destination IP Address
IP protocol field
interface
source IP address
10 points
QUESTION 2
1. _is a document that describes the application level protocol for exchanging data between intrusion
detection e
QUESTION 1
1. A large number of insurance records are to be examined to develop a model for
predicting fraudulent claims. Of the claims in the historical database, 1% were
judged to be fraudulent (class 1).
A sample database is taken to develop a model, a
WEB MINING
Data Mining Application in Business
Intelligence and Analysis
MOHAN KRISHNA GOTTIGINTI
NOVA SOUTHEASTERN UNIVERSITY
Class Project Research Paper
Mohan Krishna Gottiginti
NSU ID N01811363
College of Engineering and
Computing
MMIS 643 Data Mining
Introduction and Overview
Junping Sun
Data Mining
1-1
Data Mining
Data Mining:
Extracting useful information from large data sets. (Hand et al. 2001)
It is a process of non-trivial extraction of implicit, previously unknown, and
potentially useful infor
8/17/2012
Overview
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Core Ideas in Data Mining
Classification
Prediction
Association Rules
Data Reduction
Data Exploration
Visualization
1
8/17/2012
Super
9/14/2015
Chapter 3 Data Visualization
Data Mining for Business Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Graphs for Data Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots
Distribution Plots
Boxplots
Histograms
1
9/
8/17/2012
Chapter 5 Evaluating Classification
& Predictive Performance
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Why Evaluate?
Multiple methods are available to classify or
predict
For each method,
4/6/2015
Chapter 6: Multiple Linear
Regression
Data Mining for Business Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Topics
Explanatory vs. predictive modeling with regression
Example: prices of Toyota Corollas
Fitting a pred
QUESTION 1
1. A _ is code inserted into malware that lies dormant until a predefined condition, which triggers an
unauthorized act, is met
trapdoor
worm
Trojan horse
logic bomb
10 points
QUESTION 2
1. A _ is defined to be a portion of a row used to unique