91.510_ch06

# 91.510_ch06 - Information Theory Statistical Measures and...

This preview shows pages 1–11. Sign up to view the full content.

Information Theory, Statistical Measures and Bioinformatics approaches to gene expression

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Friday’s Class Sei Hyung Lee will make a presentation on his dissertation proposal at 1pm in this room Vector Support Clustering So I will discuss Clustering and other topics
Information Theory • Given a probability distribution, P i , for the letters in an independent and identically distributed ( i.i.d.) message, the probability of seeing a particular sequence of letters i, j, k, . .., n is simply P i P j P k ···P n or e log P i + log P j + log P k + ··· + log P n

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Information Theory 2 The information or surprise of an answer to a question (a message) is inversely proportional to its probability – the smaller the probability, the more surprise or information Ask a child “Do you like ice cream?” If the answer is yes, you’re not surprised and the information conveyed is little If the answer is no, you are surprised – more information has been given with this lower probability answer
Information Theory 3 Information H associated with probability p is H(p) = log 2 (1/p ) 1/p is the information or surprise and log 2 (1/p) = # of bits required

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Information Theory 4 Log-probabilities and their sums represent measures of information . Conversely, information can be thought of as log-probabilities (with the negative sign to make the information increase with increasing values) H(p) = log 2 (1/p ) = - log 2 p
Information Theory 5 If we have an i.i.d. with 6 values (a die), or 4 (A, C, T, G) or n values (the distribution is flat) then the probability of any particular symbol is 1/n and the information in any such symbol is then log 2 n and this value is also the average If the symbols are not equally probable (not i.d.) we need to weigh the information of each symbol by its probability of occurring. This is Claude Shannon’s Entropy H = Σ p i log 2 (1/p i ) = - Σ p i log 2 p i

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Information Theory 6 If we have a coin, assuming h and t have equal probabilities. H = - ( (1/2) log 2 (1/2) + (1/2) log 2 (1/2) ) = - ( (1/2) (-1) + (1/2) (-1) ) = - ( -1 ) = 1 bit If the coin comes up heads ¾ of the time then the entropy should decrease (we’re more certain of the outcome and there’s less surprise) H = - ( (3/4) log 2 (3/4) + (1/4) log 2 (1/4) ) = - ( (0.75) (-0.415) + (0.25) (-2) ) = - ( -0.81 ) = 0.81 bits
Information Theory 7 A random DNA source has an entropy of H = - ( (1/4) log 2 (1/4) + (1/4) log 2 (1/4) + (1/4) log 2 (1/4) + (1/4) log 2 (1/4) ) = - ( (1/4) (-2) + (1/4) (-2) + (1/4) (-2) + (1/4) (-2) ) = - ( -2 ) = 2 bits A DNA source that emits 45% A and T and 5% G and C has an entropy of H = - ( 2*(0.45) log 2 (0.45) + 2*(0.05) log 2 (0.05) ) = - ( (0.90) (-1.15) + (0.10) (-4.32) ) = - ( - 1.035 – 0.432 ) = 1.467 bits

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Natural Logs Using natural logarithms, the information is expressed in units of nats (a contraction of Na tural digi ts ). Bits and nats are easily convertible as follows nats = bits · ln (2) ln 2 or log 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 112

91.510_ch06 - Information Theory Statistical Measures and...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online