Week 1 Answers
Answer 1:
a) This is supervised learning, the loan can be classified as approved or not.
b)
This is unsupervised learning, this is just a prediction that there is no apparent outcome wh
1.
The ability to easily and quickly zoom, pan and aggregate data are typical features of _.
A) network mapping
B) interactive visualization
C) filtering
D) binninb
Points Earned:
Correct
Answer(s):
1
1.
Which of the following plots would be used to display the frequency distribution of data values?
A) Histogram
B) Bar chart
C) Line graph
D) Scatterplot
2.
In a boxplot, the horizontal line within t
Points
Missed
Percentage
1.00
97.5%
1.
A drawback of the k-NN method is that the _.
A) method does not support prediction
B) method can be used to classify only two classes
C) relationship between the
The chapter makes a distinction between using multiple linear regression analysis for explanation versus
prediction. Why is this important? If a set of independent variables can explain (account for)
Assignment 2
1.
a.
Shipments
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
b.
There seems to be a pattern in the data. For every year, the sales increases from q1 to q2 and
decreases in q3 a
Assignment 5 Multiple Linear Regression
Assignment 5 Multiple Linear Regression
Nitin Prakash Majgi
Sullivan University
Data Mining
October 29, 2017
Assignment 5 Multiple Linear Regression
6.1
a. Trai
Assignment 6 K Nearest Neighbor
Assignment 6 K Nearest Neighbor
Nitin Prakash Majgi
Sullivan University
Data Mining
November 5, 2017
Assignment 6 K Nearest Neighbor
6.1
a.
Predic
ted
Class
Act
ual
Cla
Assignment Week 5
Vamsi Krishna Medarametla (VMEDAR9424)
CSC550Z Data Mining and Distributed Computing
6.1
a.
The data mining projects use huge data. Before building a model, partition of the data is
Assignment Week 4
CSC550Z Data Mining and Distributed Computing
Problem 5.2
Classification Confusion Matrix:
Predicted Class
Actual Class
1
0
1 (Fraudulent)
a
c
0(non-fraudulent)
b
D
Error rate for tr
Does anyone have any experience with pivot tables? If so, to what purpose did you use them? How
would you evaluate their value?
Yes, I have used pivot tables in the past. I mainly used it for summariz
This week we cover the k-Nearest Neighbors (k-NN) data mining method. How would you compare this
method with Multiple Linear Regression? Under what circumstances would one been preferable to the
other
1
Abhinav Bitra
Sullivan University
CSC 550X: Data Mining
Week 5_ Chapter 6
Sunday, October 29, 2017
2
6.1
a. The training data set used to build the model explains the underlying relationships
betwee
Predicting Missing Items in
Shopping Cart
Market Basket Analysis
Sullivan University
Business Problem
The goal of this project is to predict the missing items in a shopping cart and apply association
Solution 4.1 b
Cereal Name
100% Bran
100% Natural Bran
All-Bran
All-Bran with Extra Fiber
Almond Delight
Apple Cinnamon Cheeries
Apple Jacks
Basic 4
Bran Chex
Bran Flakes
Cap 'n' Crunch
Cheeries
Cinna
I dont have any experience on working with Pivot tables. I read that Pivot Tables tool is one of the most
powerful yet intimidating features in Excel. It allows us to quickly summarize and analyze lar
1. Name: Name of cereal
2. mfr: Manufacturer of cereal where A = American Home Food Products; G = General Mills; K = Kelloggs; N = Nabisco; P = Post; Q = Quaker Oats; R = Ralston Purina
3. type: cold
K-NN is used to find the nearest example in predictor space and assign them the same class as
that example and classifies. The similar the records are classified in the space. This K-NN
method is for
4Q-CSC550X-A1-07-Data Mining-Fall 2017
Name: Abhinav Bitra
Assignment 1
2.1 Assuming that data mining techniques are to be used in the following cases, identify whether the task required
is supervised
I have used PIVOT SQL functions extensively recently in my project for live hourly
transactions reports and also for data comparison between two sheets. You can
use the PIVOT and UNPIVOT relational op
CSC550Z/X: DATA MINING
Week 2 Assignment Solution (100 points)
Instructor: Dr. Tuan Tran
3.1 Shipments of household appliances: line graphs. The file
ApplianceShipments.xls contains the series of quar
Model answers for Chapter 5: Evaluating Classification and Predictive
Performance
Problem 5.1 (4.1 in 1st edition):
Answer to 5.1:
Explanation:
Therefore in our problem the confusion matrix is
Classif
Model Answers for Chapter 5: Evaluating Classification and Predictive
Performance
Problem 5.2 (4.2 in 1st edition)
Answer to 5.2:
Let us assume a given cutoff, say, 0.5.
Classification Confusion Matri
CSC550Z/X: DATA MINING
Week 3 Assignment Solution (100 points)
Instructor: Dr. Tuan Tran
Problem 4.1. (70 points)
a.
The variables are classified into different types as follows: (5 points)
- Quantita
1
Abhinav Bitra
Sullivan University
CSC 550X: Data Mining
Week 7_ Chapter 8
Saturday, November 11, 2017
2
8.1 Personal Loan Acceptance. The file UniversalBank.xls contains data on 5000 customers of
Un
Model Answers for Chapter 5: Evaluating Classification and Predictive
Performance
Problem 5.5
Answer to 5.5.a:
Classification Confusion Matrix
Predicted Class
Actual Class
1
0
1
310
90
0
130
270
Answe
1
Abhinav Bitra
Sullivan University
CSC 550X: Data Mining
Week 6_ Chapter 7
Saturday, November 4, 2017
2
7.2 Consider the following customer: Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg