Project Guidelines
Projects!
Goal: apply machine learning to an interesting task
Proposal (due tomorrow!): 1pg
Who is in your group
Your task (and why is it interesting?)
Where did/will you get your data?
Whats your initial approach?
Its okay if you cant
Clustering Part 2
EECS 349 Spring 2015
Expectation Maximization
Learning parameters in Bayes Nets is easy if data is
complete
Just counting
But what about missing data?
We could use our standard missing data techniques (use
mean, median, etc.)
But when lo
Online Generation of Locality
Sensitive Hash Signatures
BENJAMIN VAN DURME & ASHWIN LALL
Data Overload
Our access to data is growing fast
Benjamin Van Durme & Ashwin Lall
ACL 2010
Data Overload
Our access to data is growing faster than our ability
to pr
Machine Learning
Genetic Algorithms
Doug Downey, adapted from Bryan Pardo, Machine Learning EECS 349 Fall 2007
Genetic Algorithms
Developed: USA in the 1970s
Early names: J. Holland, K. DeJong, D. Goldberg
Typically applied to:
discrete parameter opti
Machine Learning
Greedy Local Search
With slides from Bryan Pardo, Stuart Russell
ML in a Nutshell
Every machine learning algorithm has three
components:
Representation
E.g., Decision trees, instances
Evaluation
E.g., accuracy on test set
Optimizati
Basics of Probability
Northwestern EECS 349
Doug Downey
Events
Event space
E.g. for dice, = cfw_1, 2, 3, 4, 5, 6
Set of measurable events S 2
E.g.,
= event we roll an even number = cfw_2, 4, 6 S
S must:
Contain the empty event and the trivial event
Be
Machine Learning
Clustering
Some slides from B. Pardo, P. Domingos
First, some epistemology
There are known knowns. These are things we know
that we know.
Databases!
There are known unknowns. That is to say, there are
things that we know we don't know.
Bayes Net Learning and Logistic
Regression
EECS 349 Spring 2015
Learning in Bayes Nets the upshot
Where does the structure come from?
Write it down (BNs most useful in this case), or
Learn it automatically from data
(take 395/495 PGMs course to learn more
Inductive Learning and Decision Trees
Doug Downey
EECS 349 Winter 2014
with slides from Pedro Domingos, Bryan Pardo
Outline
Announcements
2
Homework #1 assigned
Have you completed it?
Inductive learning
Decision Trees
Outline
Announcements
3
Homework #1 a
Machine Learning
Neural Networks
(slides from Domingos, Pardo, others)
Human Brain
Neurons
Input-Output Transformation
Input
Spikes
Output
Spike
Spike (= a brief pulse)
(Excitatory Post-Synaptic Potential)
Human Learning
Number of neurons:
Connections per
Basics of Statistical Estimation
Doug Downey, Northwestern EECS 395/495, Fall 2014
(several illustrations from P. Domingos, University of Washington CSE)
Bayes Rule
P(A | B) = P(B | A) P(A) / P(B)
Example:
P(symptom| disease) = 0.95, P(symptom| disease) =
Machine Learning
Instance-based learning
(with slides/ideas from Bryan Pardo, Pedro Domingos, and Andrew Moore)
1
Nearest Neighbor Classifier
Example of instance-based (a.k.a case-based)
learning
The basic idea:
1. Get some example set of cases with kno