MCEN 90048
AI for Mechatronics
3. Information Theory and Regularization Methods
Lecture: Prof. Saman Halgamuge
Workshop: Dr. Richard Wang

** Subscribe** to view the full document.

Information Theory and Loss
Functions
11/16/2019
University of Melbourne
2

Machine Learning from a Probability Perspective
11/16/2019
University of Melbourne
3
A variety of machine learning models may be interpreted as the modelling of
(data)
?
or
(label|data).
?
•
Supervised learning:
In training stage, we model
based on data
and targets ; in prediction stage,
we predict
given new data .
o
Classification –
is discrete.
o
Regression –
is continuous.
•
Unsupervised learning:
o
Density estimation: in training stage, we build the model to approximate ; in
prediction stage, we calculate
o
Clustering: in training stage, we try find the intrinsic clusters
and build ; in
prediction stage, we predict .
o
Data visualization – find
to represent
where the distribution of pairwise
distances in
is approximated by that of .
•
Semi-supervised learning:
We have labelled data
and unlabelled data . In training stage, we build a
model ; in prediction stage, we predict .
•
Reinforcement learning:
Given the interaction between an agent and the environment, we build the
model .
Visualization of 16S rRNA genes (1000 nucleotide) of a simulated
microbial community (EqualSe01) using -SNE
[54]
. Visualization may
help binning of metagenomic data into operational taxonomic units
(e.g., species). -SNE works by building a distribution of pairwise
distances at each point
in high-dimensional space and approximating
that with the distance distribution at
in low-dimensional space.

** Subscribe** to view the full document.

Probability Theory – Probability Distributions
11/16/2019
University of Melbourne
4
Categories of probability distributions
Categories by definition:
•
Frequencies of events – frequentist probability or physical probability is
the long-run expected frequency of occurrence.
•
Degree of BELIEF
– Bayesian probability or evidential probability is a
measure of the plausibility of an event given incomplete knowledge.
Categories by variable continuity:
•
Discrete probability, e.g., outcome of tossing a coin, defined by
probability mass function
(PMF)
•
Continuous probability, e.g., room temperature, defined by
probability
density function
(PDF)
Categories by events:
•
Joint probability
•
Conditional probability
•
Marginal probability
for discrete
or
for continuous .
Toss a coin once and it shows “head”; what is the probability of the toss
showing tail?
The frequentist will say:
Asking the probability based on a single event makes no sense. If you
toss the coin for a sufficiently large number of times with random
initial condition each time, I would say that the frequency of tails in
all tosses will approach 0.5.
The Bayesian will say:
Since I have no information other than the coin has two sides, I have
no reason to prefer any side, so I would say the probability is 0.5.

- Three '19