This preview shows page 1. Sign up to view the full content.
Unformatted text preview: o an account and may not care what are the causes.
One myth of neural networks is that data of any quality can be used to provide reasonable
predictions and they will sift through it to find the truth. However, neural networks require as
much data preparation as any other method, which is to say they require a lot of data preparation.
The most successful implementations of neural networks (or decision trees, or logistic regression,
or any other method) involve very careful data cleansing, selection, preparation and preprocessing. For instance, neural nets require that all variables be numeric. Therefore categorical
data such as “state” is usually broken up into multiple dichotomous variables (e.g., “California,”
“New York”) , each with a “1” (yes) or “0” (no) value. The resulting increase in variables is
called the categorical explosion.
Clustering divides a database into different groups. The goal of clustering is to find groups that
are very different from each other, and whose members are very similar to each other. Unlike
classification, you don’t know what the clusters will be when you start, or by which attributes the
data will be clustered. Consequently, someone who is knowledgeable in the business must
interpret the clusters. After you have found clusters that reasonably segment your database, these
clusters may then be used to classify new data. Some of the common algorithms used to perform
clustering include Kohonen feature maps and K-means.
Don’t confuse clustering with segmentation. Segmentation refers to the general problem of
identifying groups that have common characteristics. Clustering is a way to segment data into
groups that are not previously defined, whereas classification is a way to segment data by
assigning it to groups that are already defined. 12...
View Full Document
This note was uploaded on 11/25/2010 for the course CENG ceng taught by Professor Ceng during the Spring '10 term at Universidad Europea de Madrid.
- Spring '10