Of splitting rule to use options at this point are

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: can contain any of: the vector of prior probabilities component prior, the loss matrix component loss or the splitting index component split. The priors must be positive and sum to 1. The loss matrix must have zeros on the diagonal and positive o -diagonal elements. The splitting index can be `gini' or `information'. na.action: the action for missing values. The default action for rpart is na.rpart, this default is not overridden by the optionsna.action global option. The default action removes only those rows for which either the response y or all of the predictors are missing. This ability to retain partially missing observations is perhaps the single most useful feature of rpart models. control: a list of control parameters, usually the result of the rpart.control function. The list must contain minsplit: The minimum number of observations in a node for which the routine will even try to compute a split. The default is 20. This parameter can save computation time, since smaller nodes are almost always pruned away by cross-validation. minbucket: The minimum number of observations in a terminal node. This defaults to minsplit 3. maxcompete: It is often useful in the printout to see not only the variable that gave the best split at a node, but also the second, third, etc best. This parameter controls the number that will be printed. It has no e ect on computational time, and a small e ect on the amount of memory used. The default is 5. xval: The number of cross-validations to be done. Usually set to zero during exploritory phases of the analysis. A value of 10, for instance, increases the compute time to 11-fold over a value of 0. formula 21 : The maximum number of surrogate variables to retain at each node. No surrogate that does worse than go with the majority" is printed or used. Setting this to zero will cut the computation time in half, and set usesurrogate to zero. The default is 5. Surrogates give di erent information than competitor splits. The competitor list asks which other splits would have as many correct classi cations", surrogates ask which other splits would classify the same subjects in the same way", which is a harsher criteria. usesurrogate: A value of usesurrogate=2, the default, splits subjects in the way described previously. This is similar to CART. If the value is 0, then a subject who is missing the primary split variable does not progress further down the tree. A value of 1 is intermediate: all surrogate variables except go with the majority" are used to send a case further down the tree. cp: The threshold complexity parameter. The complexity parameter cp is, like minsplit, an advisory parameter, but is considerably more useful. It is speci ed according to the formula maxsurrogate RcpT  RT  + cp  jT j  RT0  where T0 is the tree with no splits. This scaled version is much more user friendly than the original CART formula 4.1 since it is unitless. A value of cp=1 will always result...
View Full Document

This document was uploaded on 09/26/2013.

Ask a homework question - tutors are online