1.
The problem of finding hidden structure in unlabeled data is called.
A.
Supervisedlearning
B.
Unsupervisedlearning
C.
Reinforcementlearning
2.
Task of inferring a model from labeled training data is called
A.
Unsupervisedlearning
B.
Supervisedlearnin

APA: SQL Data Cleansing : Su16 MIS 3300
Team Members: Nathan Alldredge, Tyson Mears;
Instructor: Polly Conrad
1. In your own words, explain what a primary key and foreign key are and their purpose in a
database.
A primary key is one or more columns and it

Question 1
1 / 1 pts
In linear regression, the sum of squares regression (SSR) is the:
Sum of squared deviations of each observed value from its variance
Sum of squared deviations of each predicted value from the mean
Sum of squared deviations of each obs

In linear regression, which of the following is the equation to calculate R2? SSR / SST
With two levels of the target variable, an entropy of 0 indicates: a pure node
To convert probability to odds, calculate: Probability / (1 - Probability)
In an artific

In cross-validation, a percentage of the dataset is held out for the purpose of _ the model.
You Answered
training
analyzing
segmenting
Correct Answer
testing
Suppose I have a dataset with a binary outcome variable (true/false). Sixty percent of the cases

In linear regression, finding values of Bo and B1 that minimize the sume of the squared residuals
is known as what?
Correct!
least squares criterion
regression intercept
random error
population regression
When we overfit a model, we achieve greater accura

In a neural network, the speed at which the network learns can be adjusted by setting the _ .
number of hidden layers
activation threshold
learning rate
feed forward index
In a decision tree, pruning is generally done to avoid _ .
You Answered
generalizat

Analytics Portfolio Assignment
Cluster Analysis
a. Provide screenshots of the following cluster analysis output tabs (use the Windows
snipping tool or go to File Print/Export Image in RapidMiner):
i. Description
ii. Centroid Table
iii. Plot
b. Using the C

MIS 3300
Analytics Portfolio Assignment
Descriptive Statistics and Correlation
a. The descriptive statistics calculated for the first eight columns (attributes) in the dataset
(format for clarity of presentation as needed).
Population
Mean
Income
4246.42

Ethics is the set of moral codes, above and beyond the legal required minimums.
True
False
It is important to make organizational decisions about how to use and protect personal
information about people because data represent peoples' lives. Data about wh

MIS 3300
Analytics Portfolio Assignment
Data Cleansing
Introduction
The purpose of this exercise is to gain practice recognizing forms of dirty data and using tools such as
Power BI
for data cleansing.
This exercise will be completed in the same teams as

Which data analytic type usually results in rules and recommendations for the next
step? Prescriptive
Which data analytic id considered really valuable but is largely NOT used?
Prescriptive
What analysis uses past performance to determine why something ha

Analytics Portfolio Assignment
Association Analysis
a. Using the statistics from your numeric to binomial analysis (Example Set(Numerical to
Binomial) calculate the level of support as a percent to two decimal places (ex. 9.35%)
for the following items: b

SQL has two types of keys Primary Key and Foreign Key.
The customerID and the OrderID need to be in what table if the customer are going to place multiple
orders? (N:M) Primary Key: CustomerID Foreign Key: OrderID
When importing raw unfiltered data that c

Chapter 2 Review Questions:
1) What is data mining in general terms?
Data mining is an interdisciplinary subfield of computer science. It is the computational
process of discovering patterns in large data sets involving methods at the intersection of
arti

MIS 3300
Analytics Portfolio Assignment
Linear Regression
a. Provide two screenshots of the RapidMinder process in design view as follows:
i. First screenshot: the top-level process
ii. Second screenshot: the nested process inside the Validation operator

Which of the following are steps of the CRoss-Industry Standard Process for
Data Mining (CRISP-DM)?
a. Business understanding and data understanding
b. Data preparation
c. Modeling, evaluation, and deployment
d. All of the above
d. All of the Above
6. Wha

Question 1
1 / 1 pts
The measure of the strength of the relationship between each possible set of attributes in the data
set is called a:
correlation coefficent
correlation matrix
linear relationship
frequency pattern
IncorrectQuestion 2
0 / 1 pts
As one

Question 1
1 / 1 pts
The process of breaking tables apart and thereby reducing data redundancy is called.
a join
a query
denormalization
normalization
IncorrectQuestion 2
0 / 1 pts
Recording everyday activities such as buying gasoline or making an online

Question 1
1 / 1 pts
To effectively introduce data mining models into the organizational flow, you should do the
following:
Clearly communicate the model's function and utility to stake holders.
Thoroughly test and prove the model.
All of these steps are

Question 1
1 / 1 pts
This course is blended. This means
All instruction and interaction is synchronous.
Instruction and interaction are both synchronous and asynchronous.
None of these answers are correct.
All instruction and interaction is asynchronous.