Unformatted text preview: can contain any of: the vector of prior probabilities component prior, the
loss matrix component loss or the splitting index component split. The
priors must be positive and sum to 1. The loss matrix must have zeros on the
diagonal and positive o -diagonal elements. The splitting index can be `gini'
na.action: the action for missing values. The default action for rpart is
na.rpart, this default is not overridden by the optionsna.action global option. The default action removes only those rows for which either the response
y or all of the predictors are missing. This ability to retain partially missing
observations is perhaps the single most useful feature of rpart models.
control: a list of control parameters, usually the result of the rpart.control
function. The list must contain
minsplit: The minimum number of observations in a node for which
the routine will even try to compute a split. The default is 20. This
parameter can save computation time, since smaller nodes are almost
always pruned away by cross-validation.
minbucket: The minimum number of observations in a terminal node.
This defaults to minsplit 3.
maxcompete: It is often useful in the printout to see not only the variable
that gave the best split at a node, but also the second, third, etc best.
This parameter controls the number that will be printed. It has no e ect
on computational time, and a small e ect on the amount of memory used.
The default is 5.
xval: The number of cross-validations to be done. Usually set to zero
during exploritory phases of the analysis. A value of 10, for instance,
increases the compute time to 11-fold over a value of 0.
formula 21 : The maximum number of surrogate variables to retain at
each node. No surrogate that does worse than go with the majority"
is printed or used. Setting this to zero will cut the computation time
in half, and set usesurrogate to zero. The default is 5. Surrogates give
di erent information than competitor splits. The competitor list asks
which other splits would have as many correct classi cations", surrogates ask which other splits would classify the same subjects in the
same way", which is a harsher criteria.
usesurrogate: A value of usesurrogate=2, the default, splits subjects in
the way described previously. This is similar to CART. If the value is 0,
then a subject who is missing the primary split variable does not progress
further down the tree. A value of 1 is intermediate: all surrogate variables
except go with the majority" are used to send a case further down the
cp: The threshold complexity parameter.
The complexity parameter cp is, like minsplit, an advisory parameter, but is
considerably more useful. It is speci ed according to the formula
maxsurrogate RcpT RT + cp jT j RT0
where T0 is the tree with no splits. This scaled version is much more user friendly
than the original CART formula 4.1 since it is unitless. A value of cp=1 will always
View Full Document
- Fall '13
- Regression Analysis, Missing values