Name:
CS39000-DM0 Midterm1: Spring 2014
This is a closed-book, closed-notes exam. Non-programmable calculators are allowed for probability
calculations.
There are six pages including the cover page. The total number of points for the exam is 30. Note
the
CS39000-DMO Homework 1
Due date: Friday January 27, 11:59pm (submit pdf to Blackboard)
1
Basic Probability and Statistics
1. (4 pts)
(a) Suppose that E, F and G are independent events. Prove that
P [E ^ (F _ G)] = P (E)P (F _ G)
(b) Let A and B be indepen
CS39000-DM0 Homework 4
Due date: Wednesday March 29, 11:59pm
In this programming assignment you will implement the k-means clustering algorithm and
apply it on the Yelp dataset. Instructions below detail how to turn in your code and
assignment to Blackboa
CS39000-DM0 Homework 3
Due date: Thursday March 9, 11:59pm
In this programming assignment you will implement a naive Bayes classification algorithm
and evaluate it on the Yelp dataset. Instructions below detail how to turn in your code
and assignment to B
CS39000-DM0 Homework 2
Due date: Wednesday, February 15, 11:59pm in Blackboard.
Submit a PDF with both your answers to the questions and the R code that you used for
analysis. Your homework must be typed. Use of Latex is recommended, but not required.
In
Data Mining & Machine Learning
CS39000-DM0
Purdue University
October 18, 2016
Linear Classifiers
Linear threshold functions
Associate a weight (wi) with each feature (xi)
Prediction: sign(b + wTx) = sign (b + wi xi)
b + wTx 0 predict y=1
Otherwise, p
Data Mining & Machine Learning
CS39000-DM0
Purdue University
September 1, 2016
Probability and statistics basics
Modeling uncertainty
Necessary component of almost all data analysis
Approaches to modeling uncertainty:
Fuzzy logic
Possibility theory
R
Data Mining & Machine Learning
CS39000-DM0
Purdue University
August 23, 2016
Adapted from Jennifer Neville Fall15 slides
Course overview
Goals
Identify key elements of data mining and
machine learning algorithms
Understand how algorithmic elements
inter
Data Mining & Machine Learning
CS39000-DM0
Purdue University
September 29, 2016
Predictive modeling: introduction
Classification
In its simplest form, a classification model defines a decision
boundary (h) and labels for each side of the boundary
Input:
Data Mining & Machine Learning
CS39000-DM0
Purdue University
September 22, 2016
Data exploration
and visualization
http:/extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html
http:/extremepresentation.typepad.com/blog/2006/09/choosing_a_good.h
1
Notes
Terminal symbols: XML attributes, XML elements defined as #PCDATA, eg. @id, year.
Non Terminals: XML elements defined in terms of other elements, eg. library, book.
2
Library
1:
library h book | paper | journal i+
2:
3:
4:
book @id title author+
CS39000-DM0 Homework 5
Due date: Monday April 17, 11:59pm.
In this programming assignment you will use python to implement an association rule
algorithm and apply it to the yelp4.csv dataset, which has only discrete attributes and
no missing values. Instr
XML DTDs
Dr P Sreenivasa Kumar
Professor
Ravichandran, Ravindra, Karthik & Sudharshana
Department of CS&E
I I T Madras
Document Type Definitions (DTD)
Provides details about the elements, their attributes,
notations and entities contained in an XML docum
Introduction
Part I: Statistical Decision Theory
Lecture 2: Statistical Decision Theory (Part I)
Hao Helen Zhang
Spring, 2014
Hao Helen Zhang
Lecture 2: Statistical Decision Theory (Part I)
Introduction
Part I: Statistical Decision Theory
Outline of This
arXiv:1201.4089v1 [cs.AI] 19 Jan 2012
A Description Logic Primer
Markus Krtzsch, Frantiek Simanck, Ian Horrocks
Department of Computer Science, University of Oxford, UK
Abstract. This paper provides a self-contained first introduction to description logic
XPath 1.0 & 2.0
What is XPath ?
XPath is a W3C standard
XPath expressions are similar to URLs, hence the name
XPath uses path expressions to navigate through the
hierarchy of XML document tree; and address parts of an
XML document
XPath uses a compact, no
Tutorial: Recommender Systems
International Joint Conference on Artificial Intelligence
Beijing, August 4, 2013
Dietmar Jannach
TU Dortmund
Gerhard Friedrich
Alpen-Adria Universitt Klagenfurt
Recommender Systems
Application areas
In the Social Web
Why usi
Modeling and Querying Web Data
Dr P Sreenivasa Kumar
Professor
Department of CS&E
I I T - Madras
Data - Metadata
Relational model
Data : values in the relational tuples
Metadata : the schema information
The relation names
The attribute names and thei
Quilt: An XML Query Language for Heterogeneous Data Sources
Don Chamberlin
Jonathan Robie
Daniela Florescu
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120
Software AG - USA
3207 Gibson Road
Durham, NC 27703
INRIA
78153 Le Chesnay cedex
Franc
Data Mining & Machine Learning
CS39000-DM0
Purdue University
October 13, 2016
Linear Classifiers
Linear threshold functions
Associate a weight (wi) with each feature (xi)
Prediction: sign(b + wTx) = sign (b + wi xi)
b + wTx 0 predict y=1
Otherwise, p
Data Mining & Machine Learning
CS39000-DM0
Purdue University
October 6, 2016
Classification
In its simplest form, a classification model defines a decision
boundary (h) and labels for each side of the boundary
Input: x=cfw_x1,x2,.,xn is a set of attribu
Data Mining & Machine Learning
CS39000-DM0
Purdue University
September 15, 2015
Data exploration
and visualization
Measurement
Real world
Data
Relationship
in real world
Relationship
in data
Goal: map domain entities to symbolic representations
Tabular da
Data Mining & Machine Learning
CS39000-DM0
Purdue University
!
February 4, 2014
Data exploration
and visualization
http:/extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html
http:/extremepresentation.typepad.com/blog/2006/09/choosing_a_good.
Data Mining & Machine Learning
CS39000-DM0
Purdue University
!
January 23, 2014
Probability and statistics (cont)
Expectation
Denotes the expected value or mean value of a random variable X
Discrete
E [X ] =
x
Continuous
E [X ] =
x
Expectation of a f
Data Mining & Machine Learning
CS39000-DM0
Purdue University
!
January 28, 2014
Data and Measurement
Measurement
Real world
Data
Relationship
in real world
Relationship
in data
Goal: map domain entities to symbolic representations
What is data?
Collecti
Data Mining & Machine Learning
CS39000-DM0
Purdue University
!
February 11, 2014
Python: A Simple Tutorial
Slides adapted from UPenn CIS530 python tutorial
Why Python?
Interpreted language
Dynamically typed: variables do not have a predened type
Rich,
Data Mining & Machine Learning
CS39000-DM0
Purdue University
!
February 6, 2014
Predictive modeling: introduction
Data mining components
Task specication: Prediction
Data representation: Homogeneous IID data
Knowledge representation
Learning techniqu
Data Mining & Machine Learning
CS39000-DM0
Purdue University
!
January 30, 2014
Data exploration
and visualization
Visualization
Human eye/brain have evolved powerful methods to detect structure in nature
Display data in ways that exploit human patter