WEKA KnowledgeFlow Tutorial
for Version 3-5-8
Mark Hall
Peter Reutemann
July 14, 2008
c
2008
University of Waikato
Contents
1 Introduction
2
2 Features
3
3 Components
3.1 DataSources .
3.2 DataSinks . .
3.3 Filters . . . .
3.4 Classifiers . .
3.5 Clustere

Mining Frequent Patterns
without Candidate Generation
Jiawei Han, Jian Pei and Yiwen Yin
School of Computer Science
Simon Fraser University
Presented by Song Wang. March 18th, 2009 Data Mining Class
Slides Modified From Mohammed and Zhenyus Version
Outlin

Data Science Journal, Volume 12, 13 May 2013
A DATA-DRIVEN METHOD FOR SELECTING OPTIMAL MODELS
BASED ON GRAPHICAL VISUALISATION OF DIFFERENCES IN
SEQUENTIALLY FITTED ROC MODEL PARAMETERS
K S Mwitondi*1, R E Moustafa2, and A S Hadi3
*1
Sheffield Hallam Uni

Health Psychol. Author manuscript; available in PMC 2011 Sep 1.
Published in final edited form as:
Health Psychol. 2010 Sep; 29(5): 496505.
doi: 10.1037/a0020428
PMCID: PMC3021982
NIHMSID: NIHMS246693
Adults' Physical Activity Patterns across Life
Domains

2015 Third International Conference on Artificial Intelligence, Modelling and Simulation
Comparative Analysis between K-Means and K-Medoids for Statistical Clustering
Norazam Arbin
Nur Suhailayani Suhaimi
Faculty of Computer and Mathematical Sciences
Univ

Close Lesson |
STAT 200
Elementary Statistics
Introduction
Probability Distributions: Discrete Random Variables
Mean, also called Expected Value, of a Discrete Variable
Binomial Random Variable
Probability Distributions: Continuous Random Variable
Finding

Apply the Apriori method to the following dataset using excel using a support threshold of 20%. Do not use Weka to complet
Customer1: Bread,Cereals,Milk
Customer2:Tomatoes,Egg
Customer3:Pork,Bread,Milk
Customer4:Sugar,Tomatoes,Pork,Bread
Customer5:Vinega

Review: Full, Anonymous: No
Apply the tree induction on the Iris dataset using the information gain (ID3), gain ratio (J48), and
Gini index (CART). Complete each of the induction step either using WEKA. You can either use
the "Explorer" or the "KnowledgeF

Engineering Division
Courses offered in the BFA major of the Departments of Arts at the University of
Diploma Printing in Romanigstan
Foundation Courses
Course #
ARTS 400
ARTS 401
Course Name
EXPERIMENTAL WRITING SEM: The Ecology of Poetry
ART: ancient to

Name Chijioke John Ifedili
Exam
Data Mining
The following are the rules relating to this take home-exam. Any questions about interpretation of problems
should be addressed to me.
1.
Once you have downloaded the exam, you may not discuss it in any way with

Student name,semester new,coursename
Bill Mumy,Fall 2004,BEHAVIORAL PHARMACOLOGY
Bill Mumy,Fall 2000,AMERICAN FOREIGN POLICY
Bill Mumy,Fall 2003,DRUGS BRAIN AND MIND
Bill Mumy,Fall 2005,Environmental Case Studies
Bill Mumy,Fall 2000,COMPUTER LINEAR ALGEBR

Chijioke John Ifedili
MPS data Analytics
Homework 6 (three problems)
1. A college admissions officer for the schools online undergraduate program wants to estimate the
mean age of its graduating students. From a previous study the standard deviation was a

Data Mining and Knowledge Discovery, 8, 5387, 2004
c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
Mining Frequent Patterns without Candidate
Generation: A Frequent-Pattern Tree
Approach
JIAWEI HAN
University of Illinois at Urbana-Cham

Using any one of the univariate statistical methods discussed in lesson 11 this week try to
identify the outlier(s) in the given data set (Hint: use any one univariate statistical method except
for the Grubbs test)
199.31 199.53 200.19 200.82 201.92 201.

Austin Howell
Diabetes Study
1. Business Understanding
The data is used to determine if a subject shows attributes suggestive of diabetes.
2. Data Understanding. Discuss in details the characteristics of the data.
The subjects were tested on 9 attributes.

1. Discuss: Is possible to design a genetic optimization experiment to address the issues in
this talk? Also, discuss the pros and cons of using genetic algorithms for such tasks.
Algo trading
Based on the video, it does seem like you could design a genet

Austin Howell
Homework 4
Steps of FP-Tree
1.
2.
3.
4.
5.
6.
Find and count all items in the transactions.
Find the frequency of each item.
Drop items that fall below minimum support.
Order each item by frequency of occurrence.
Create tree row by row based

Discuss the differences and similarities between random forests and decision trees. Also discuss why
random forests achieve better results than decision trees.
A decision tree is a single tree, where a random forest is an ensemble of decision trees. Ense

Non-stationary time series have no bias for zero. Stationary time series appear to be
returning to zero most of the time while non-stationary can return to zero, as demonstrated in
the video, although sometimes they will not.
Non-stationary series are ran

In this homework assignment you will be required to demonstrate your understanding of the
concepts related to conditional probabilities and Nave Bayes classifier.
Consider the data set shown in the table below.
Instance
1
2
3
4
5
6
7
8
9
10
A
0
1
0
1
1
0

1.
Describe a genetic and a particle swarm optimization algorithm that performs Euclidean distance based
clustering into three clusters in a two dimensional space.
The movement of particles is determined by the best known position in the cluster. This is

Discuss: How would you design an artificial network such as the one discussed? In your view, what are
the pros and cons of using artificial neural networks tor such a task?
Breast Cancer diagnostics
I would design an artificial network to diagnose cancer

I hope that I understand what is being asked. In the video she discussed the parameters that they
thought they would target and then explained how that changed as the experiment expanded, so I
will just continue with that process.
I thought that age, kids

1. Are you someone who prefers to take risk or avoid risk?
A. You are given $5,000 to invest. You must choose between (i) a sure gain of $2,500 and (ii) a 0.50
chance of a gain of $5,000 and a 0.50 chance to gain nothing. What is the expected gain with ea