Combining deep belief nets with Gaussian processes
Deep belief nets can benefit a lot from unlabeled data
when labeled data is scarce.
They just use the labeled data for fine-tuning.
Kernel methods, like Gaussian processes, work well on
small labeled t

How good is a shortlist found this way?
We have only implemented it for a million
documents with 20-bit codes - but what could
possibly go wrong?
A 20-D hypercube allows us to capture enough
of the similarity structure of our document set.
The shortlis

Time series models
Inference is difficult in directed models of time
series if we use non-linear distributed
representations in the hidden units.
It is hard to fit Dynamic Bayes Nets to highdimensional sequences (e.g motion capture
data).
So people ten

Time series models
If we really need distributed representations (which we
nearly always do), we can make inference much simpler
by using three tricks:
Use an RBM for the interactions between hidden and
visible variables. This ensures that the main sour

Why the autoregressive connections do not cause
problems
The autoregressive connections do not mess up
contrastive divergence learning because:
We know the initial state of the visible units, so we
know the initial effect of the autoregressive
connectio

The conditional RBM model
(Sutskever & Hinton 2007)
Given the data and the previous hidden
state, the hidden units at time t are
conditionally independent.
So online inference is very easy
Learning can be done by using
contrastive divergence.
Reconstruc

Stacking temporal RBMs
Treat the hidden activities of the first level
TRBM as the data for the second-level
TRBM.
So when we learn the second level, we
get connections across time in the first
hidden layer.
After greedy learning, we can generate from
the

Modeling multiple types of motion
We can easily learn to model walking and
running in a single model.
This means we can share a lot of knowledge.
It should also make it much easier to learn
nice transitions between walking and running.
In a switching

An application to modeling
motion capture data
(Taylor, Roweis & Hinton, 2007)
Human motion can be captured by placing
reflective markers on the joints and then using
lots of infrared cameras to track the 3-D
positions of the markers.
Given a skeletal m

Finding binary codes for documents
2000 reconstructed counts
Train an auto-encoder using 30
logistic units for the code layer.
During the fine-tuning stage,
add noise to the inputs to the
code units.
The noise vector for each
training case is fixed. So

Deep Autoencoders
(Hinton & Salakhutdinov, 2006)
28x28
W1T
1000 neurons
They always looked like a really
nice way to do non-linear
dimensionality reduction:
But it is very difficult to
optimize deep autoencoders
using backpropagation.
We now have a muc

Performance of the autoencoder at
document retrieval
Train on bags of 2000 words for 400,000 training cases
of business documents.
First train a stack of RBMs. Then fine-tune with
backprop.
Test on a separate 400,000 documents.
Pick one test document

The variational bound
Each time we replace the prior over the hidden units by a better
prior, we win by the difference in the probability assigned
l = L "1
log p (v) # log P (v) +
1
!
l =1
log Pl +1 (hl ) " log Pl (hl )
Q ( hl |v )
Now we cancel out all o

The root mean squared error in the orientation
when combining GPs with deep belief nets
GP on
the
pixels
GP on
top-level
features
GP on top-level
features with
fine-tuning
100 labels 22.2
17.9
15.2
500 labels 17.2
12.7
7.2
1000 labels 16.3
11.2
6.4
Conclu

Modeling real-valued data
For images of digits it is possible to represent
intermediate intensities as if they were probabilities by
using mean-field logistic units.
We can treat intermediate values as the probability
that the pixel is inked.
This will

The free-energy of a mean-field logistic unit
energy
F
In a mean-field logistic unit, the
total input provides a linear
energy-gradient and the negative
entropy provides a containment
function with fixed curvature.
So it is impossible for the value
0.7

Do the 30-D codes found by the deep
autoencoder preserve the class
structure of the data?
Take the 30-D activity patterns in the code layer
and display them in 2-D using a new form of
non-linear multi-dimensional scaling
The method is called UNI-SNE (Co

Using Gaussian visible
units we can get much
sharper predictions and
alternating Gibbs
sampling is still easy,
though learning is
slower.
E ( v,h) =
!
i $ vis
(vi " bi )
2# i2
2
E
An RBM with real-valued visible units
bi
energy-gradient
produced by the t

Retrieving documents that are similar
to a query document
We can use an autoencoder to find lowdimensional codes for documents that allow
fast and accurate retrieval of similar
documents from a large set.
We start by converting each document into a
bag

How to compress the count vector
2000 reconstructed counts
500 neurons
250 neurons
10
250 neurons
500 neurons
2000 word counts
output
vector
We train the neural
network to reproduce its
input vector as its output
This forces it to
compress as much
infor

Generating from a learned model
The inputs from the earlier states
of the visible units create
dynamic biases for the hidden
and current visible units.
Perform alternating Gibbs
sampling for a few iterations
between the hidden units and the
current visi

Generating the parts of an object
One way to maintain the
constraints between the parts is
to generate each part very
accurately
But this would require a lot of
communication bandwidth.
Sloppy top-down specification of
the parts is less demanding
but it

Semi-restricted Boltzmann Machines
We restrict the connectivity to make
learning easier.
Contrastive divergence learning requires
the hidden units to be in conditional
equilibrium with the visibles.
But it does not require the visible units
to be in co

FIT1031: Computers and Networks
www.monash.edu.au
FIT1031: Computers and Networks
Unit Introduction
www.monash.edu.au
Outline
People involved
Unit objectives
Resources
Unit structure
Tutorials
Assessment
Responsibilities
Unitintro: FIT1031 Computers and N

www.monash.edu.au
FIT1031- Computers and Networks
Lecture Notes 11
Exam and Revision
www.monash.edu.au
Topics
Type of questions
What do you need to know?
About the exam
Exam technique
Staff consultation
Sample questions
LN11: FIT1031 Computers and Network

FIT1031-Computers and Networks
www.monash.edu.au
FIT1031- Computers and Networks
Lecture Notes 6
Operating Systems
www.monash.edu.au
LN 6: Learning Objectives
Function of OS
Background System Software
Brief history of OS
OS activities
process management

FIT1031 Semester 2, 2014
Introduction to Computers & Networks
Solutions to FIT1031 Tutorial T07
Introduction to Computers and Networks
Exercise 1 Define briefly the terms Internet, Intranets and Extranets?
Solution:
Internet: The Internet is a global syst