Insurance Claims Settlement: find similar claims in the past
Real estate: Property price appraisal based on previous sales
K-Nearest Neighbor can be used for classification tasks.
Step 1: Using a cho
The goal: ideal DM environment
All data mining algorithms want their input in tabular form rows &
columns as in a spreadsheet or database table
The columns
Contain data that describe aspects of the c
General Algorithms
GAs can be used to try different combinations of parameters for building
models, to see which build parameters are best.
Assume we arrived at ten formulae, using a variety of neural
often work well for classes that are hard to separate using linear
methods or the axis-parallel splits used by decision trees.
K-Nearest neighbor works well even when there is some missing data
K-Nea
We would maintain a population of solutions, with each solutions
having a score. Solutions with a higher fitness score are given a
higher probability of mating.
The Traveling Sales Person is a well-k
Based on the concept of similarity
The distance metric can be, for instance, simple Euclidian distance or
weighted Euclidian distance. The Euclidian distance d(X,Y) between
two points, X = (x1, x2, .
Principal Component Analysis -1
Open the dataset Open the file Utilities.xls, which gives data on 22
public utilities in the US.
Principal Component Analysis -2
In XLMiner, select Data Reduction and E
CF is a variant of MBR particularly well suited to personalized
recommendations
Starts with a history of peoples personal preferences
Uses a distance function people who like the same things are
clos
Survival Analysis aka Time-to-Event Analysis is very valuable for
understanding customers
Survival tells us when to start worrying about customers
Most important facet of customer behavior is the cus
Based on an analogy to biological processes similar to neural
networks and memory-based reasoning techniques
Goal is to maximize the fitness
Are being utilized for
Complex scheduling
Resource optimiz
Question 1
After examining 20 cases of ownership, how many owners have been
correctly identified, based on the below lift chart?
Answ 6
ers:
8
10
12
14
Question 2
Based on the table below, what is the
Question 1
For the given neural network, if the cutoff value was 0.56, the output of this
network would be classified as 1?
Answe True
rs:
False
Question 2
What is the best value of k to select for th
For categorical values, we need to convert them to numeric values.
We might treat being in class A as 1, and not in class A as 0.
Therefore, two items in the same class have distance 0 for that
attri
Principal Component Analysis
PCA is used to reduce the dimensionality of data.
Normalized data points are mapped onto a multi-dimensional graph.
Dimensions with the highest variances account for the
Fitness functions are used to decide how good a solution is. The
fitness function used depends on the application: e.g. for model
parameter selection we could use the accuracy of profitability of the
MBR and K Nearest Neighbor are lazy approaches: they do not
construct a model in advance, but rather wait till they have to classify
a new example.
Contrast this with decision trees which construct m
Memory-Based Reasoning (MBR) results are based on analogous
situations in the past
Collaborative Filtering results use preferences in addition to
analogous situations from the past
Knowledge represen
Adaptation techniques include:
Predefined formulas or processes for adapting the solution
Interpolation: Assuming, you had two old bridges
Bridge 1 = 2 lanes, and cost $1 million
Bridge 2 = 4 lanes,
We can look up similar cases using a distance metric.
5 bins are made depending on the count of records and the values of
x3 are kept in them.
For all the values of x3 lying in one bin, the output va
Question 1
How many hidden layers are shown in the neural network below?
Ans 0
wers
:
1
2
3
4
Question 2
Recommender systems, like the system Amazon.com uses, are typically
based on which data mining