hw1 - ii. calculate accuracy on the (corrupted) training...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE 5800 Mining/Learning and the Internet HW1 Due 6:30pm, Sep 14, Wed Submit Server: course=cse5800 , project=hw1 Implement and evaluate the RIPPER algorithm: 1. FOIL gain: http://jmvidal.cse.sc.edu/talks/learningrules/foilgain.xml 2. Allow continuous-valued attributes 3. Allow multiple classes 4. Allow the option of no pruning (default is with pruning) 5. Allow a parameter k for the number of ”optimizations” 6. Three data sets: (a) Restaurant in the handout and on the course web site (b) Intrusion detection on the course web site (c) your own data set with more than two classes [or from Resources on the course web site] 7. Separate the data set into a training set and a test set, report the accuracy on the two disjoint sets (with and without pruning). 8. A report (in pdf) that discusses the following: (a) For the second data set: i. corrupt the class labels of training examples from 0% to 50% (5% increment), by changing from the correct class to another class.
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ii. calculate accuracy on the (corrupted) training and (non-corrupted) test sets (b) plot accuracy vs. noise percentage in the training and test sets. (c) compare the training and test accuracy of the rules with and without pruning 9. Implementation: (a) input files: attributes description, training data, test data (b) You would have two (maybe three) executables: i. Miner/Learner: input training examples/instances, output ruleset ii. Classifier/predictor: input ruleset and labeled instances, output the classifications/predictions and how accurate the tree is with respect to the correct labels (% of correct classifications). iii. ruleset printer: if the output from the learner is human-readable, no need for a ruleset printer; otherwise, build a ruleset printer so that we can see the learned ruleset. 10. Submission: (a) source code (b) your data set (c) report in pdf (d) README.txt (how to compile and run your program)...
View Full Document

This note was uploaded on 02/10/2012 for the course CSE 5800 taught by Professor Staff during the Fall '09 term at FIT.

Ask a homework question - tutors are online