Final Exam, Spring 2013, Decision Support Systems
Your Name:
Part 1: Dimension Reduction of Digital Music Data (15 points)
Recommended time: 1 hour
Recommended space: 1 page or less (any format)
A digital music distribution company hires you. You are aske
Notes From Book
BADM 453
Chapter 3: Data Visualization
In this chapter you will learn about:
Basic Plots
o Bar charts, line graphs, scatterplots
Distribution Plots
o Boxplots, histograms
Specialized Plots
o Hierarchical, network, geographical
Enhancem
Project 2 Proposal: HootSuite
Our project will be based on the SaaS HootSuite and its capabilities. HootSuite is a social media
management system for businesses and organizations to collaboratively execute campaigns
across multiple social networks from on
Schedule BADM 453-Decision Support Systems
Spring 2014
Revised: March 02, 2014
Week
1
Date
Jan 21
2
Jan 28
3
4
Feb 4
Feb 11
5
6
Feb 18
Feb 25
7
Module
Introduction to BI, importance of BI, Job
opportunities
Chapter 2 Overview of Data mining process
(TP Gr
BADM 453
Assignment 1
Question 1: Chapter 4, question 4.3
Part (a): In the excel sheet
Part (b): The PC sheet in excel. Answer the two questions in the document file.
Answer: The principal compone
Problem 7.2
a) The best K chosen from the K_NN process after normalizing the data was 2.
b) The predicted MEDV was 18.428376
c) The training data shows zero errors as since the model is used for prediction. There
is an overlap when using the training data
BADM 453
Assignment 1
Due Date: March 4
Question 1: Chapter 4, question 4.3 all parts.
5 points
Part (a): In the excel sheet
Part (b): The PC sheet in excel. Answer the two questions in the document file.
(typo correction in the book: Metallic should be c
Source:
"Predicting Corporate Bankruptcy"
Darden Business Publishing
Case authors Mark E. Haskins (HASKINSM@Darden.virginia.edu) and Phillip E. Pfeifer (PFEIFERP@Darden.virginia.edu)
NO
D
YR
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R
Notes From Book
BADM 453
Chapter 2: Overview of the Data Mining Process
2.1 Introduction
OLAP and SQL are not covered in this book
They do not involve statistical modeling
2.2 Core Ideas in Data Mining
Classification
Most basic form of data analysis
o E
Notes From Book
BADM 453
Chapter 1: Introduction
1.1 What Is Data Mining?
Data Mining
Extracting useful information from large data sets
Longer version: Data mining is the process of exploration and analysis of large quantities
of data in order to disco
Chapter 5: Judging Classification Performance
Classification Confusion Matrix
2x2 matrix that summarizes the correct and incorrect classifications (all possibilities) of a dataset
Actual Class 1, Predicted Class 1
Actual Class 0, Predicted Class 1 (Inc
Notes From Book
Chapter 6: Multiple Linear Regression
Linear Regression Models are good for explanatory model and prediction:
Explanatory Modeling
Goal: Explain relationship between predictors and target variable
Model Goal: Fit the data well and unders
Notes From Book
BADM 453
Chapter 8: Nave Bayes
Data-driven, not model-driven
Classification on validation data (or new records) relies solely on the data found from the
training data
o If validation prediction is guilty find the proportion of guilty that
Notes From Book
Chapter 9: Classification and Regression Trees (CART)
Easily understandable and transparent method for predicting and classifying new records
Trees and Rules
Goal: Classify or predict an outcome based on a set of predictors
o Tree is a gr
Notes From Book
BADM 453
Chapter 15: Time Series
Explanatory vs Predictive Modeling
Main focus is on forecasting (predicting)
o Seek to predict future values based on past trends
Describing/explaining (explanatory) is the goal of Time Series Analysis
o
Case 2: Takyo Software Cataloger
Stephen Martinek
1)
Gross Profit From Random Selection = (# of Names)*(Response Rate)*(Avg Spending per
Purchaser) (Cost of Mailing per Catalog)*(# of Catalogs Mailed)
# of Names = 180,000
Response Rate = 1065/20,000 = 0.0
Stephen Martinek
Case 1 Write Up
Principal component analysis (PCA) can be used to identify a group of highly correlated
variables and understand their underlying importance to the target variable. In this case, our
target variable is D, which indicates w