1.
The problem of finding hidden structure in unlabeled data is called.
A.
Supervisedlearning
B.
Unsupervisedlearning
C.
Reinforcementlearning
2.
Task of inferring a model from labeled training data is called
A.
Unsupervisedlearning
B.
Supervisedlearning

APA: SQL Data Cleansing : Su16 MIS 3300
Team Members: Nathan Alldredge, Tyson Mears;
Instructor: Polly Conrad
1. In your own words, explain what a primary key and foreign key are and their purpose in a
database.
A primary key is one or more columns and it

Question 1
In linear regression, the sum of squares regression (SSR) is the:
Sum of squared deviations of each observed value from its variance
Sum of squared deviations of each predicted value from the mean
Sum of squared deviations of each observed value

In linear regression, which of the following is the equation to calculate R2? SSR / SST
With two levels of the target variable, an entropy of 0 indicates: a pure node
To convert probability to odds, calculate: Probability / (1 - Probability)
In an artific

In cross-validation, a percentage of the dataset is held out for the purpose of _ the model.
training
analyzing
segmenting
testing
Suppose I have a dataset with a binary outcome variable (true/false). Sixty percent of the cases

In linear regression, finding values of Bo and B1 that minimize the sume of the squared residuals
is known as what?
least squares criterion
regression intercept
random error
population regression
When we overfit a model, we achieve greater accuracy

In a neural network, the speed at which the network learns can be adjusted by setting the _ .
number of hidden layers
activation threshold
learning rate
feed forward index
In a decision tree, pruning is generally done to avoid _ .
generalization

Analytics Portfolio Assignment
Cluster Analysis
a. Provide screenshots of the following cluster analysis output tabs (use the Windows
snipping tool or go to File Print/Export Image in RapidMiner):
i. Description
ii. Centroid Table
iii. Plot
b. Using the C

MIS 3300
Analytics Portfolio Assignment
Descriptive Statistics and Correlation
a. The descriptive statistics calculated for the first eight columns (attributes) in the dataset
(format for clarity of presentation as needed).
Population
Mean
Income
4246.42

Ethics is the set of moral codes, above and beyond the legal required minimums.
True
False
It is important to make organizational decisions about how to use and protect personal
information about people because data represent peoples' lives. Data about what

MIS 3300
Analytics Portfolio Assignment
Data Cleansing
Introduction
The purpose of this exercise is to gain practice recognizing forms of dirty data and using tools such as
Power BI
for data cleansing.
This exercise will be completed in the same teams as

Which data analytic type usually results in rules and recommendations for the next
step? Prescriptive
Which data analytic id considered really valuable but is largely NOT used?
Prescriptive
Prescriptive
What analysis uses past performance to determine why something happened

Analytics Portfolio Assignment
Association Analysis
a. Using the statistics from your numeric to binomial analysis (Example Set(Numerical to
Binomial) calculate the level of support as a percent to two decimal places (ex. 9.35%)
for the following items: beer

SQL has two types of keys Primary Key and Foreign Key.
The customerID and the OrderID need to be in what table if the customer are going to place multiple
orders? (N:M) Primary Key: CustomerID Foreign Key: OrderID
When importing raw unfiltered data that c

Chapter 2 Review Questions:
1) What is data mining in general terms?
Data mining is an interdisciplinary subfield of computer science. It is the computational
process of discovering patterns in large data sets involving methods at the intersection of
arti

MIS 3300
Analytics Portfolio Assignment
Linear Regression
a. Provide two screenshots of the RapidMinder process in design view as follows:
i. First screenshot: the top-level process
ii. Second screenshot: the nested process inside the Validation operator

Which of the following are steps of the CRoss-Industry Standard Process for
Data Mining (CRISP-DM)?
a. Business understanding and data understanding
b. Data preparation
c. Modeling, evaluation, and deployment
d. All of the above
d. All of the Above
6. Wha

Question 1
The measure of the strength of the relationship between each possible set of attributes in the data
set is called a:
correlation coefficent
correlation matrix
linear relationship
frequency pattern
As one

Question 1
1 / 1 pts
The process of breaking tables apart and thereby reducing data redundancy is called.
a join
a query
denormalization
normalization
Recording everyday activities such as buying gasoline or making an online purchase

Question 1
To effectively introduce data mining models into the organizational flow, you should do the
following:
Clearly communicate the model's function and utility to stake holders.
Thoroughly test and prove the model.
All of these steps are correct

Question 1
This course is blended. This means
All instruction and interaction is synchronous.
Instruction and interaction are both synchronous and asynchronous.
None of these answers are correct.
All instruction and interaction is asynchronous.