ABriefIntroductionToDeterministicAnn

# ABriefIntroductionToDeterministicAnn - A Brief Introduction...

This preview shows pages 1–3. Sign up to view the full content.

A Brief Introduction to Deterministic Annealing Justin Muncaster Department of Computer Science University of California, Santa Barbara Abstract This paper provides a short description of Deterministic Annealing [1] and its information theoretic derivation. The technique is presented in the context of clustering and vector quantization, where its application is most obvious, although the technique’s applications range much further. This paper is based on [1] and is meant to provide an approachable introduction to deterministic annealing. 1. Introduction Deterministic annealing is an optimization technique that attempts to find a global minimum of a cost function. The technique is designed to be able to explore a large portion of the cost surface using randomness, while still performing optimization using local information. The procedure starts with changing the cost function to introduce a notion of randomness, allowing a large area to be explored. Each iteration the amount of randomness (measured by Shannon Entropy [2]) is constrained, and a local optimization of performed. Gradually, the amount of imposed randomness is lowered so that upon termination the algorithm optimizes over the original cost function, yielding a solution to the original problem. 2. Clustering and compression In clustering we wish to represent a space of data points by a smaller set of codevectors. This is to say we with to partition the space into subsets where elements in each subset are as similar as possible. This problem has applications to many fields, ranging from pattern recognition to compression. In the following, we will define the problem and present the classic k -means solution, which will reappear in a different form when we discuss the deterministic annealing approach. 2.1 Problem definition Mathematically, the problem is defined as follows. Suppose we are given a source vector X x that we wish to transmit across a noiseless channel. We will encode x by a codevector y from a codebook Y . We will always encode x using an index to the “best” reproduction codevector, denoted y ( x ). The best reproduction codevector is defined with respect to a distortion function ( ) , d , which we wish to minimize. The distortion function quantifies the difference between a source vector x and a reproduction vector y ( x ). In most cases distortion is measured by squared distance; however this need not be the case in general.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Lossy compression is achieved by a many-to-one mapping of source vectors to codevectors. This effectively partitioning the space X into “clusters” surrounding the codevectors, where the number of clusters (and hence, the compression rate) is determined by the size of the codebook that we wish to use.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern