Michael Henry DSC 441 Assignment #3 Problem 1 1. Validation Accuracy Result Cross-Validation d=50, np=25, nc=12 d=3, rules=3, 92.3% Cross-Validation d=10, np=20, nc=10 d=4, rules=5, 95.0% Holdout Partitioning d=10, np=20, nc=10 d=3, rules=3, 91.3% training, 86.5% testing Holdout Partitioning d=10, np=25, nc=12 d=4, rules=4, 92.0% training, 93.1% testing Holdout Partitioning d=10, np=30, nc=15 d=3, rules=3, 91.2% training, 89.6% testing Holdout Partitioning d=5, np=10, nc=5 d=4, rules=6, 95.6% training, 95.9% testing Holdout Partitioning d=5, np=16, nc=8 d=3, rules=3, 91.8% training, 88.5% testing Holdout Partitioning d=8, np=20, nc=10 d=3, rules=3, 89.2% training, 93.3% testing
Growing Method: CRT Validation: Holdout Partitioning with 66% training and 34% testing split Maximum tree depth: 10 Minimum cases for parent node: 25 Minimum cases for child node: 12 Impurity measure: Gini Index 2. There are 5 nodes and 3 are terminal nodes. 3. The three most important Lupus data features are Elements 10, 11 and 1. They have the highest importance compared to the other variables. 4. After increasing the minimum number of cases for the parent node from 25 to 30 and