CSc/Emgt 404 - Data Mining
Final Exam
December 11, 2001
Name:_
Time exam started: _
Time exam completed: _
NOTE: Exam time should NOT exceed 2 hours and 30 minutes.
Fax to: D. St. Clair
573-341-4501
St. Clairs phone number for questions: 573-465-5963 (ava

Problem L7-1:
1) Construct a C4.5 model that uses the inputs you suggested in problem L6-2 to
predict the value of symboling. Use 3-fold cross validation. [Note, you can use
Quinlans C4.5 code or one of the decision tree classifiers in weka.
Solution:
Att

Problem L5-1: Use the C4.5 software to produce a tree and rule set for Quinlans golf
data.
Solution:
Copy all the executables of C4.5 and the data files to a directory.
Go to the Command Prompt in Windows or UNIX and enter the working directory.
Use th

Problem 3.3: The age values for the data tuples are (in increasing order): 13, 15, 16,
16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
Solution (a):
With a bin depth of 3 as given, we divide the data into th

Problem 4.5(a): Consider Association Rule:
major (X, science) => status (X, undergrad).
(4.8)
The number of task-relevant data tuples is 5000.
56% of undergraduates are major in science
64% of the students are undergraduate degrees
70% of the students are

1. i)Normalization from [-25,20] to [0,25]. Basically taking the absolute value of each
value
ii) I do not see another transformation
2. min = 6, max = 82, new_min = 0.0, new_max = 5.0
v = (v min) / (max min)*(new_max new_min) + new_min
when v = 52
v = (5

CSc 404 Data Mining
Exam #1
October 1, 2002
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators. You are NOT to get help from others. Points will be assigned
on ans

CSc 401 Data Mining
Exam #1
September 25, 2001
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators. You are NOT to get help from others. Points will be assigned
on

CSc 401 Data Mining
Exam #2
November 4, 2002
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators. You are NOT to get help from others. Points will be assigned
on an

yCSc 401 Data Mining
Exam #2
October 30, 2001
Name:_
Time exam started: _
Time exam completed: _
NOTE: Exam time should NOT exceed 2 hours and 30 minutes.
Fax to: D. St. Clair
573-341-4501
St. Clairs phone number for questions: 573-465-5963 (available 3:4

1.
a) Skipthis was on the first test see jpg test
b) Pos (1 3/7 / 1)
Neg(2 / 0)
2.
a) x*v + v0 = [1.2 6 11.6] + [-1 0 3]= [0.2 6 -8.6]=z_in
z=sigmoid(z_in)=[0.45 0.002 0.999]
z*w + w0 = 1.9058 + (1)=2.9058=y_in
y=3*(y_in) + 1=9.7174
b) W = alpha*delta*z,

Interesting Measures
Simplicity for human comprehension
Rule length may be calculated by # conjuncts in rule (Note: each conjunct adds further
restriction to the rule.)
Certainty: Each discovered pattern should have a measure of certainty associated with

Association Rules
Why use Association Rules?
Data Understanding
Which attributes are strongly related?
People who shop at home depot also shop at?
Model Understanding
Compare / contrast highest-scoring decile with lowest-scoring decile
Can help to e

1. The StudentNumber would have more gain. This is because it will be able to split on
each number for each student.
I would use the height instead. This would still have a decent gain, but it would not split
on every student, so the tree would still be m

CSc 401 Data Mining
Final Exam
December 13, 2002
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note
exam. You may use calculators but you may NOT use computers. [You need maintain only 2
decima

CSc 401 Data Mining
Final Exam
December 18, 2003
Name:_
Score:_/100
Directions: Carefully answer each of the following questions. This is an open-book, open-note exam.
You may use calculators but you may NOT use computers. You are NOT to get help from oth

Problem L12-1:
Build a logistic regression model:
Using L12_logistic_regression.arff build a Weka Knowledge Flow diagram
use TrainTestSplitMaker w/ default parameters
connect Logistic model node to Prediction Appender
connect Prediction Appender to CSV

Problem L11-1: Work problem #5 (Ch. 8 Burden & Faires)
1. a-c By hand. Clearly show all work.
2. a-f using regression program in Excel.
Solution:
x
4
4.2
4.5
4.7
5.1
5.5
5.9
6.3
6.8
7.1
y
102.56
113.18
130.11
142.05
167.53
195.14
224.87
256.73
299.5
326.7