Chapter 4 Dimension
Reduction
Data Mining for Business
Intelligence
Exploring the data
Statistical summary of data: common metrics
Average
Median
Minimum
Maximum
Standard deviation
Counts & percen
Social Networking Site
1
Social Networking Site
Social Networking Site
2
Hardware and software architecture required for a social networking site (such as Facebook or
LinkedIn):
There are various option
Assignment 3- Chapter 4
Data Mining and Distributed Computing
Q 1. Breakfast Cereals. Use the data for the breakfast cereal
example in Section 4.7 to explore and summarize the data as
follows: (Note that a few
CSC550Z: Data Mining & Distributed Computing (Summer 2016)
Week 1 Assignment Solution (100 points)
2.1 Assuming that data mining techniques are to be used in the following cases, identify whether
the task required is supervised or unsupervised learning. (
Assignment 2- Chapter 3
Q1. Shipments of Household Appliances: Line Graphs. The file
ApplianceShipments.xls contains the series of quarterly shipments
(in million $) of U.
Chapter 3: Data Visualization
Graphs for Data Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots
Distribution Plots
Boxplots
Assignment 1-Chapter 2
Q1. Assuming that data mining techniques are to be used in the
following cases identify if the task required is supervised or
unsupervised learning
Assignment 4- Chapter 5
Problems:
Q1. A data mining routine has been applied to a transaction data
set and has classified 88 records as fraudulent (30 correctly so) and
95
CSC550Z: Data Mining & Distributed Computing (Summer 2016)
Week 2 Assignment Solution (100 points)
3.1 Shipments of household appliances: line graphs. The file ApplianceShipments.xls
contains the series of quarterly shipments (in million $) of US househol
CSC550 Project Proposal
Predicting Airfare: Southwest Airlines
Airline Industry
K-Nearest Neighbor Exercise #1
Purpose: To learn how to build a K-Nearest Neighbors model for prediction purposes.
We will use the validation data set to determine the optimal number of neighbors in
CSC550Z/X: DATA MINING
Assignment 6 Instructions
Chapter 7: k-Nearest Neighbors (k-NN)
Problem 7.1. Personal Loan Acceptance
This week we are dealing with a classification problem using k-NN method. To accomplish this assignment,
you need to carefully rea
Assignment 3
4.1a) The following variables are numerical/quantitative
calories
protein
fat
sodium
fiber
carbo
sugars
potass
vitamins
weight
cups
rating
The following are ordinal
shelf
The following are nominal
Mfr
Type
b)
Calories
Mean
Stan
Assignment 5
6.1
A)
1. The data should be partitioned into training and validation sets because we need
two sets of data: one to build the model that depicts the relationship between the
predictor variables and the predicted variable,
Assignment9
13.1) Association rule determines associations among items listed in the columns and does not
associate items between rows. A disadvantage of association rules however is that they dont take
into account the sequential information that is avai
Assignemnt-2
Regression Model: Predicting airfares on new routes
Scenario:
Several airports have opened in major cities in USA, opening the market for new
routes. In order to price flights on these routes a majo
Chapter 2: Do Problems 1-3, 5, 8, 10: Submit to Drop Box 1.1
2.1 Assuming the data mining techniques are to be used in the following cases, identify whether the task
required is supervised or unsupervised learning.
a. Deciding whether to issue a loan to a
Data Mining
Chapter 4/ Week 3 Homework
4.1 Breakfast Cereals. Use the data for the breakfast cereal example in Section 4.8 to explore and summarize the
data as follows. (Note that a few records contain missing values; since there are just a few, a simple
Chapter 5/ Week 4 Homework
5.1 A data mining routine has been applied to a transaction data set and has classified 88 records as fraudulent (30
correctly so) and 952 as non-fraudulent (920 correctly so) construct the classification matrix and calculate th
Chapter 1-2
Overview of Data Mining
Why Mine Data?
Massive data is collected and warehoused
Web data, e-commerce
Purchase at grocery stores
Bank/credit card transactions
Date
18
3.1
A)
B) There is a decline every fourth quarter.
C) Generally, Q2 and Q3 are higher than Q1 and Q4
D)
E)
F) An interactive visualization tool is better, since you can slice and dice and
zoom in and out. The effort it takes to create these in Excel is a
Assignment 4
5.1
Classification Matrix Model:
Actual Class
Predicted c0
c0
n (0,0) =No of cases correctly
classified as c0.
c1
n (1,0) = No of cases
misclassified as c0
Predicted c1
n (0,1) = No of cases
misclassified as c1
n (1,1) = No of cases
correctly
CSC550Z: Data Mining for Business Intelligence
Week 1: Introduction to Data Mining
