In fact a given result can be associated with several

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: arameters (nine weights and four bias or constant terms) in the neural network shown in Figure 4. Because they are so numerous, and because so many combinations of parameters result in similar predictions, the parameters become uninterpretable and the network serves as a “black box” predictor. In fact, a given result can be associated with several different sets of weights. Consequently, the network weights in general do not aid in understanding the underlying process generating the prediction. However, this is acceptable in many applications. A bank may want to automatically recognize handwritten applications, but does not care about the form of the functional relationship between the pixels and the characters they represent. Some of the many applications where hundreds of variables may be input into models with thousands of parameters (node weights) include modeling of chemical plants, robots and financial markets, and pattern recognition problems such as speech, vision and handwritten character recognition. One advantage of neural network models is that they can easily be implemented to run on massively parallel computers with each node simultaneously doing its own calculations. Users must be conscious of several facts about neural networks: First, neural networks are not easily interpreted. There is no explicit rationale given for the decisions or predictions a neural network makes. Second, they tend to overfit the training data unless very stringent measures, such as weight decay and/or cross validation, are used judiciously. This is due to the very large number of parameters of the neural network which, if allowed to be of sufficient size, will fit any data set arbitrarily well when allowed to train to convergence. Third, neural networks require an extensive amount of training time unless the problem is very small. Once trained, however, they can provide predictions very quickly. Fourth, they require no less data preparation than any other method, which is to say they require a lot of data preparation. One myth of neural networks is that data of any quality can be used to provide reasonable predictions. The most successful implementations of neural networks (or decision trees, or logistic regression, or any other method) involve very careful data cleansing, selection, preparation and pre-processing. For instance, neural nets require that all variables be numeric. Therefore categorical data such as “state” is usually broken up into multiple dichotomous variables (e.g., “California,” “New York”) , each with a “1” (yes) or “0” (no) value. The resulting increase in variables is called the categorical explosion. Finally, neural networks tend to work best when the data set is sufficiently large and the signal-tonoise ratio is reasonably high. Because they are so flexible, they will find many false patterns in a low signal-to-noise ratio situation. Decision trees Decision trees are a way of representing a series of rules that lead to a class or value. For example, you may wish to classify loan applicants as good or bad credit risks. Figure 7 shows a simple decision tree that solves this problem while illustrating all the basic components of a decision tree: the decision node, branches and leaves. 14 © 1999 Two Crows Corporation Income > $40,000 No Yes Job > 5 Years No Good Risk Bad Risk Yes High Debt Yes Bad Risk No Good Risk Figure 7. A simple classification tree. The first component is the top decision node, or root node, which specifies a test to be carried out. The root node in this example is “Income > $40,000.” The results of this test cause the tree to split into branches, each representing one of the possible answers. In this case, the test “Income > $40,000” can be answered either “yes” or “no,” and so we get two branches. Depending on the algorithm, each node may have two or more branches. For example, CART generates trees with only two branches at each node. Such a tree is called a binary tree. When more than two branches are allowed it is called a multiway tree. Each branch will lead either to another decision node or to the bottom of the tree, called a leaf node. By navigating the decision tree you can assign a value or class to a case by deciding which branch to take, starting at the root node and moving to each subsequent node until a leaf node is reached. Each node uses the data from the case to choose the appropriate branch. Armed with this sample tree and a loan application, a loan officer could determine whether the applicant was a good or bad credit risk. An individual with “Income > $40,000” and “High Debt” would be classified a “Bad Risk,” whereas an individual with “Income < $40,000” and “Job > 5 Years” would be classified a “Good Risk.” Decision trees models are commonly used in data mining to examine the data and induce the tree and its rules that will be used to make predictions. A number of different algorithms may be used f...
View Full Document

This note was uploaded on 01/19/2014 for the course STATS 315B taught by Professor Friedman during the Winter '08 term at Stanford.

Ask a homework question - tutors are online