Chapter 4 Dimension
Reduction
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Exploring the data
Statistical summary of data: common metrics
Average
Median
Minimum
Maximum
Standard deviation
Counts & percen
Social Networking Site
1
Social Networking Site
Nikhil Bhatagalikar (NBHATA1304)
Sullivan University
Social Networking Site
2
Hardware and software architecture required for a social networking site (such as Facebook or
LinkedIn):
There are various option
Sangeeta Naidu
Assignment 3- Chapter 4
October 16th 2015
Dr Tuan Tran
Data Mining and Distributed Computing
Q 1. Breakfast Cereals. Use the data for the breakfast cereal
example in Section 4.7 to explore and summarize the data as
follows: (Note that a few
(mu—Wang. — x
su|Warm:ngel\earnlng.com/Section,lAssessmentlQuestion/GradeDeliveryaspﬂentry‘d =927604CAD7664132BEEC65E637D452ED5LresponseJd=BQCCSCZSCADMBDQBFQSDETED7936934
1. when a dam mining model assigns an observation tn nne dass but in fart it belnngs
CSC550Z: Data Mining & Distributed Computing (Summer 2016)
Week 1 Assignment Solution (100 points)
2.1 Assuming that data mining techniques are to be used in the following cases, identify whether
the task required is supervised or unsupervised learning. (
Sangeeta Naidu
Assignment 2- Chapter 3
October 9th, 2015
Dr Tuan Tran
Data Mining and Distributed Computing
Q1. Shipments of Household Appliances: Line Graphs. The file
ApplianceShipments.xls contains the series of quarterly shipments
(in million $) of U.
CSC550Z Fall 2015
Data Mining and Distributed Computing
Chapter 3: Data Visualization
Instructor: Dr. Tuan Tran
Galit Shmueli and Peter Bruce 2010
Graphs for Data Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots
Distribution Plots
Boxplots
His
Sangeeta Naidu
Assignment 1-Chapter 2
October 6th, 2015
Dr Tuan Tran
Data Mining and Distributed Computing
Q1. Assuming that data mining techniques are to be used in the
following cases identify if the task required is supervised or
unsupervised learning
Sangeeta Naidu
Assignment 4- Chapter 5
October 21st, 2015
Dr. Tuan Tran
Datamining and Distributed Learning
Problems:
Q1. A data mining routine has been applied to a transaction data
set and has classified 88 records as fraudulent (30 correctly so) and
95
CSC550Z: Data Mining & Distributed Computing (Summer 2016)
Week 2 Assignment Solution (100 points)
3.1 Shipments of household appliances: line graphs. The file ApplianceShipments.xls
contains the series of quarterly shipments (in million $) of US househol
CSC550 Project Proposal
Predicting Airfare: Southwest Airlines
Team Members: Bhushan Barakale,
Mohammad Bhuiyan, Sri Charan
Annamraju, Richard Leister
CSC550Z: Data Mining
Sullivan University
February 12, 2017
Instructor: Dr. Tuan Tran
1
Airline Industry
Fin 40230/70230
Business Forecasting
Prof. Barry Keating
K-Nearest Neighbor Exercise #1
Purpose: To learn how to build a K-Nearest Neighbors model for prediction purposes.
We will use the validation data set to determine the optimal number of neighbors in
CSC550Z/X: DATA MINING
Assignment 6 Instructions
Chapter 7: k-Nearest Neighbors (k-NN)
Problem 7.1. Personal Loan Acceptance
This week we are dealing with a classification problem using k-NN method. To accomplish this assignment,
you need to carefully rea
Assignment 3
4.1a) The following variables are numerical/quantitative
calories
protein
fat
sodium
fiber
carbo
sugars
potass
vitamins
weight
cups
rating
The following are ordinal
shelf
The following are nominal
Mfr
Type
b)
Calories
Mean
Stan
Assignment 5
Sricharan Annamraju
6.1
A)
1. The data should be partitioned into training and validation sets because we need
two sets of data: one to build the model that depicts the relationship between the
predictor variables and the predicted variable,
Assignment9
13.1) Association rule determines associations among items listed in the columns and does not
associate items between rows. A disadvantage of association rules however is that they dont take
into account the sequential information that is avai
Talluri Prasanth
2015JULB02046
PGDM-FINANCE
Assignemnt-2
Regression Model: Predicting airfares on new routes
Scenario:
Several airports have opened in major cities in USA, opening the market for new
routes. In order to price flights on these routes a majo
Chapter 2: Do Problems 1-3, 5, 8, 10: Submit to Drop Box 1.1
2.1 Assuming the data mining techniques are to be used in the following cases, identify whether the task
required is supervised or unsupervised learning.
a. Deciding whether to issue a loan to a
Data Mining
Chapter 4/ Week 3 Homework
4.1 Breakfast Cereals. Use the data for the breakfast cereal example in Section 4.8 to explore and summarize the
data as follows. (Note that a few records contain missing values; since there are just a few, a simple
Chapter 5/ Week 4 Homework
5.1 A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30
correctly so) and 952 as non-fraudulent (920 correctly so) construct the classification matrix and calculate th
Course Number:
CSC550x
Course Title:
Data Mining
Prerequisites:
CSC499 and Knowledge of MS Excel. Knowledge of
math and statistics is helpful.
Department/Scho
ol:
School of Business
Description:
This course introduces the basic ideas and techniques
of dat
CSC550Z Fall 2015
Data Mining and Distributed Computing
Chapter 1-2
Overview of Data Mining
Instructor: Dr. Tuan Tran
Why Mine Data?
Massive data is collected and warehoused
Web data, e-commerce
Purchase at grocery stores
Bank/credit card transactions
Date
ConfiguratiCustomer Post Store Post Retail Pric Screen Size attery Lif RAM (GB) Processor Integrated HD Size (G
B
Bundled Ap X Cust OS Y Cust OS X Store Y StoreCustomerS
OS
OS
2008/01/01 00:01:19
163 EC4V 5BH
SE1 2BN
455
15
5
1
2 Yes
80 Yes
532041
18
3.1
A)
B) There is a decline every fourth quarter.
C) Generally, Q2 and Q3 are higher than Q1 and Q4
D)
E)
F) An interactive visualization tool is better, since you can slice and dice and
zoom in and out. The effort it takes to create these in Excel is a
Assignment 4
5.1
Classification Matrix Model:
Actual Class
Predicted c0
c0
n (0,0) =No of cases correctly
classified as c0.
c1
n (1,0) = No of cases
misclassified as c0
Predicted c1
n (0,1) = No of cases
misclassified as c1
n (1,1) = No of cases
correctly
CSC550Z: Data Mining for Business Intelligence
Week 1: Introduction to Data Mining
Tuan Tran, Ph.D.
Assistant Professor
College of Information and Computer Technology
Sullivan University
Email: [email protected]
September 30, 2016
Tuan Tran, Ph.D. (Sulli
Organizational Culture in Success
Organizational Culture in Success
Sreekanth Gadiparthy(SGADIP1791)
MGT 545X - Leadership and Team Development
Sullivan University
Organizational Culture in Success
Introduction
There has been growing concerns over the Way