1. i)Normalization from [-25,20] to [0,25]. Basically taking the absolute value of each
value
ii) I do not see another transformation
2. min = 6, max = 82, new_min = 0.0, new_max = 5.0
v = (v min) / (max min)*(new_max new_min) + new_min
when v = 52
v = (5
CSc 404 Data Mining
Exam #1
October 1, 2002
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators. You are NOT to get help from others. Points will be assigned
on ans
CSc 401 Data Mining
Exam #1
September 25, 2001
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators. You are NOT to get help from others. Points will be assigned
on
CSc 401 Data Mining
Exam #2
November 4, 2002
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators. You are NOT to get help from others. Points will be assigned
on an
yCSc 401 Data Mining
Exam #2
October 30, 2001
Name:_
Time exam started: _
Time exam completed: _
NOTE: Exam time should NOT exceed 2 hours and 30 minutes.
Fax to: D. St. Clair
573-341-4501
St. Clairs phone number for questions: 573-465-5963 (available 3:4
1.
a) Skipthis was on the first test see jpg test
b) Pos (1 3/7 / 1)
Neg(2 / 0)
2.
a) x*v + v0 = [1.2 6 11.6] + [-1 0 3]= [0.2 6 -8.6]=z_in
z=sigmoid(z_in)=[0.45 0.002 0.999]
z*w + w0 = 1.9058 + (1)=2.9058=y_in
y=3*(y_in) + 1=9.7174
b) W = alpha*delta*z,
Interesting Measures
Simplicity for human comprehension
Rule length may be calculated by # conjuncts in rule (Note: each conjunct adds further
restriction to the rule.)
Certainty: Each discovered pattern should have a measure of certainty associated with
Association Rules
Why use Association Rules?
Data Understanding
Which attributes are strongly related?
People who shop at home depot also shop at?
Model Understanding
Compare / contrast highest-scoring decile with lowest-scoring decile
Can help to e
1. The StudentNumber would have more gain. This is because it will be able to split on
each number for each student.
I would use the height instead. This would still have a decent gain, but it would not split
on every student, so the tree would still be m
CSc 401 Data Mining
Final Exam
December 13, 2002
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators but you may NOT use computers. [You need maintain only 2
decima
CSc 401 Data Mining
Final Exam
December 18, 2003
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note exam.
You may use calculators but you may NOT use computers. You are NOT to get help from oth
CSc/Emgt 404 - Data Mining
Final Exam
December 11, 2001
Name:_
Time exam started: _
Time exam completed: _
NOTE: Exam time should NOT exceed 2 hours and 30 minutes.
Fax to: D. St. Clair
573-341-4501
St. Clairs phone number for questions: 573-465-5963 (ava
Problem L12-1:
Build a logistic regression model:
Using L12_logistic_regression.arff build a Weka Knowledge Flow diagram
use TrainTestSplitMaker w/ default parameters
connect Logistic model node to Prediction Appender
connect Prediction Appender to CSV
Problem L11-1: Work problem #5 (Ch. 8 Burden & Faires)
1. a-c By hand. Clearly show all work.
2. a-f using regression program in Excel.
Solution:
x
4
4.2
4.5
4.7
5.1
5.5
5.9
6.3
6.8
7.1
y
102.56
113.18
130.11
142.05
167.53
195.14
224.87
256.73
299.5
326.7
1. In the Stadium Heartburn Data,
a) derive an association rule with two items in the antecedent with the
consequent ice_cream? Compute the support and confidence.
b) What are 5 possible rules for the item set cfw_beer, hotdogs, nachos?
Compute their supp
Problem L9-1:
1. Use Weka to construct a backpropagation neural network model that uses the
inputs you suggested in problem L6-2 to predict the value of symboling. You
must use at least two of the following nominal attributes 3, 4, 5, 7, 8, 9, & 18.
Use 3
Problem L8-1: Using the golf data from Figure 2.1 of Ch. 2 of Quinlans book,
assume that the outlook values of the last two tuples in the dataset are unknown, viz.
? 68 80 false Play
? 70 96 false Play
a.) Compute gain (outlook) [See Ch. 3]
Solution:
Outl
Problem L7-1:
1) Construct a C4.5 model that uses the inputs you suggested in problem L6-2 to
predict the value of symboling. Use 3-fold cross validation. [Note, you can use
Quinlans C4.5 code or one of the decision tree classifiers in weka.
Solution:
Att
Problem L5-1: Use the C4.5 software to produce a tree and rule set for Quinlans golf
data.
Solution:
Copy all the executables of C4.5 and the data files to a directory.
Go to the Command Prompt in Windows or UNIX and enter the working directory.
Use th