CSCE 5380 Data Mining
Assignment 1: Exploring Data
Wasana Santiteerakul
1.
Discuss whether or not each of the following activities is a data mining task.
a.
Dividing the customers of a company according to their gender.
ANS
This activity
is not
a data mining task because it can be done by using a simple
database query.
b.
Dividing the customers of a company according to their profitability.
ANS
This activity
is not
a data mining. If profitability of each customer is one of the
attributes in customer records, using a threshold can divide the customers according to
their profitability.
c.
Computing the total sales of a company.
ANS
This activity
is not
a data mining task because the total sales can be computed by using
simple calculations.
d.
Sorting a student database based on student identification numbers.
ANS
This activity
is not
a data mining task because it is a simple database algorithm.
e.
Predicting the outcomes of tossing a (fair) pair of dice.
ANS
This activity
is not
a data mining task because predicting the outcome of tossing a fair
pair of dice is a probability calculation, which doesn’t have to deal with large amount of data
or use complicate calculations or techniques.
f.
Predicting the future stock price of a company using historical records.
ANS
This activity
is
a data mining task. Historical records of stock price can be used to
create a predictive model called regression, one of the predictive modeling tasks that is used
for continuous variables.
g.
Monitoring the heart rate of a patient for abnormalities.
ANS
This activity
is
a data mining task called anomaly detection. By observing the heart rate
of the patient, this data mining task can identify the abnormalities if the characteristics of the
heart rate are different from normal observations.
h.
Monitoring seismic waves for earthquake activities.
ANS
This activity
is
a data mining task.
i.
Extracting the frequencies of a sound wave.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
ANS
This activity
is not
a data mining task because major categories of data mining tasks
consist of predictive tasks and descriptive tasks. However, this activity may be considered as
a data preprocessing to prepare suitable data before implementing data mining tasks.
2.
Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have
more than one interpretation, so briefly indicate your reasoning if you think there may be
some ambiguity.
a.
Time in terms of AM or PM.
ANS
Binary, qualitative, ordinal
b.
Brightness as measured by a light meter.
ANS
Continuous, quantitative, ratio
c.
Brightness as measured by people’s judgments.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Data Mining, Jaccard index, Hamming distance, Cosine similarity

Click to edit the document details