Apply the Apriori method to the following dataset using excel using a support threshold of 20%. Do not use Weka to complet
Customer1: Bread,Cereals,Milk
Customer2:Tomatoes,Egg
Customer3:Pork,Bread,Mi
I hope that I understand what is being asked. In the video she discussed the parameters that they
thought they would target and then explained how that changed as the experiment expanded, so I
will ju
Austin Howell
Diabetes Study
1. Business Understanding
The data is used to determine if a subject shows attributes suggestive of diabetes.
2. Data Understanding. Discuss in details the characteristics
In this homework assignment you will be required to demonstrate your understanding of the
concepts related to conditional probabilities and Nave Bayes classifier.
Consider the data set shown in the ta
Engineering Division
Courses offered in the BFA major of the Departments of Arts at the University of
Diploma Printing in Romanigstan
Foundation Courses
Course #
ARTS 400
ARTS 401
Course Name
EXPERIME
Name Chijioke John Ifedili
Exam
Data Mining
The following are the rules relating to this take home-exam. Any questions about interpretation of problems
should be addressed to me.
1.
Once you have down
Homework 2
*ImportantNote: these computations should be done by hand either with a simple calculator,
or using a spreadsheet like Excel. If using a spreadsheet, be sure to show the computations on
the
Mining Frequent Patterns
without Candidate Generation
Jiawei Han, Jian Pei and Yiwen Yin
School of Computer Science
Simon Fraser University
Presented by Song Wang. March 18th, 2009 Data Mining Class
S
1. Are you someone who prefers to take risk or avoid risk?
A. You are given $5,000 to invest. You must choose between (i) a sure gain of $2,500 and (ii) a 0.50
chance of a gain of $5,000 and a 0.50 ch
ApplythelearnedclusteringtechniquestotheirisdatainKnime.Wherepossible,trydifferent
distancemeasures.
KMeansScatterPlot:
K-Medoids Distance: Eudidean
NumericalDistance=>DBSCAN=>ScatterPlot:
Green is no
Lesson 2 HW.
The goal of this weeks homework is for you to update the knowledge flow in
Knime and sample the iris data to 50%. Once you have done that, repeat the
same experiments and present and comp
The goal of this weeks homework is for you to manually determine if a
difference of 1.5 between the petal lengths of setosa and versicolor is
statistically significant at 95% confidence. For this you
o
How is our decision making impacted by our biases?
o
Can data mining help with choosing a best solution available?
This is a very interesting topic, and I learned a lot from watching the video.
The
Read about the FP-tree method of discovering frequent itemsets. Submit one document that contains
the following sections:
In your own words, explain how the FP-tree can be used to mine frequent itemse
Canweuseunsuperviseddataminingtosolveanyissuesintroducedinthe
video?
Yes, we can use unsupervised data mining to solve any issues.
2014. 5 billion degbie of data collected to use
The data in the deci
Chijioke John Ifedili
MPS data Analytics
Homework 6 (three problems)
1. A college admissions officer for the schools online undergraduate program wants to estimate the
mean age of its graduating stude
Review: Full, Anonymous: No
Apply the tree induction on the Iris dataset using the information gain (ID3), gain ratio (J48), and
Gini index (CART). Complete each of the induction step either using WEK
Discuss: How would you design an artificial network such as the one discussed? In your view, what are
the pros and cons of using artificial neural networks tor such a task?
Breast Cancer diagnostics
I
1.
Describe a genetic and a particle swarm optimization algorithm that performs Euclidean distance based
clustering into three clusters in a two dimensional space.
The movement of particles is determi
Non-stationary time series have no bias for zero. Stationary time series appear to be
returning to zero most of the time while non-stationary can return to zero, as demonstrated in
the video, although
Discuss the differences and similarities between random forests and decision trees. Also discuss why
random forests achieve better results than decision trees.
A decision tree is a single tree, where
Austin Howell
Homework 4
Steps of FP-Tree
1.
2.
3.
4.
5.
6.
Find and count all items in the transactions.
Find the frequency of each item.
Drop items that fall below minimum support.
Order each item b
1. Discuss: Is possible to design a genetic optimization experiment to address the issues in
this talk? Also, discuss the pros and cons of using genetic algorithms for such tasks.
Algo trading
Based o
Using any one of the univariate statistical methods discussed in lesson 11 this week try to
identify the outlier(s) in the given data set (Hint: use any one univariate statistical method except
for t