1) In order to make the best decision tree, we first need to create a subset of the dataset in
order to be able to compare the risks on an equal basis. To do so, the variable filter_$ was
created, taking approximately 70% of the values. This variable will therefore be used as the
training set. Using the criteria above, the overall correct percentage of the model reach
2) The final tree has 13 nodes and 7 terminal nodes (cf. classification tree table).
3) The three most important data features in
building the tree are respectively V10, V11 and V9.
4) When we increase the parameters of cases allowed in parent and child nodes, the
complexity given by the number of nodes reduce. By increasing the minimum of cases, in the
nodes, we give more tolerance in the decision tree and therefore remove complexity.