# Each node uses the data from the case to choose the

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: leaf node. By navigating the decision tree you can assign a value or class to a case by deciding which branch to take, starting at the root node and moving to each subsequent node until a leaf node is reached. Each node uses the data from the case to choose the appropriate branch. Decision trees models are commonly used in data mining to examine the data and induce a tree and its rules that will be used to make predictions. A number of different algorithms may be used for building decision trees including CHAID (Chi-squared Automatic Interaction Detection), CART (Classification And Regression Trees), Quest, and C5.0. Neural networks Neural networks are of particular interest because they offer a means of efficiently modeling large and complex problems in which there may be hundreds of predictor variables that have many interactions. (Actual biological neural networks are incomparably more complex.) Neural nets are most commonly used for regressions but may also be used in classification problems. A neural network (see figure) starts with an input layer, where each node corresponds to a predictor variable. These input nodes are connected to a number of nodes in a hidden layer. Each input node is connected to every node in the hidden layer. The nodes in the hidden layer may be connected to nodes in another hidden layer, or to an output layer. The output layer consists of one or more response variables. 10 Output layer Input layer Hidden layer A neural network with one hidden layer. After the input layer, each node takes in a set of inputs, multiplies them by a connection weight adds them together, applies a function (called the activation or squashing function) to them, and passes the output to the node(s) in the next layer. For example, the node above has five inputs (x0 through x4) each of which is multiplied by a weight and then added together resulting in a sum I: I = .3X1+.7X2-.2X3+.4X4-.5X5= .3-.7-.2+.4+.5=.3 This output y is then the sum that has been transformed by the non-linear activation function, in this case to a value of .57. x1 = +1 0.3 x2 = -1 0.7 x3 = +1 -0.2 0.4 x4 = +1 -0.5 Output (y) Input (I) x0 = -1 The goal of training the neural net is to estimate the connection weights so that the output of the neural net accurately predicts the test value for a given input set of values. The most common training method is backpropagation. Each training method has a set of parameters that control various aspects of training such as avoiding local optima or adjusting the speed of conversion. 11 Neural networks differ in philosophy from many statistical methods in several ways. First, a neural network usually has more parameters than does a typical statistical model. For example, a neural network with 100 inputs and 50 hidden nodes will have over 5,000 parameters. Because they are so numerous, and because so many combinations of parameters result in similar predictions, the parameters become uninterpretable and the network serves as a “black box” predictor. However, this is acceptable in CRM applications. A bank may assign the probability of bankruptcy t...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online