91.510_ch06 - Information Theory, Statistical Measures and...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Information Theory, Statistical Measures and Bioinformatics approaches to gene expression
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Friday’s Class Sei Hyung Lee will make a presentation on his dissertation proposal at 1pm in this room Vector Support Clustering So I will discuss Clustering and other topics
Background image of page 2
Information Theory • Given a probability distribution, P i , for the letters in an independent and identically distributed ( i.i.d.) message, the probability of seeing a particular sequence of letters i, j, k, . .., n is simply P i P j P k ···P n or e log P i + log P j + log P k + ··· + log P n
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Information Theory 2 The information or surprise of an answer to a question (a message) is inversely proportional to its probability – the smaller the probability, the more surprise or information Ask a child “Do you like ice cream?” If the answer is yes, you’re not surprised and the information conveyed is little If the answer is no, you are surprised – more information has been given with this lower probability answer
Background image of page 4
Information Theory 3 Information H associated with probability p is H(p) = log 2 (1/p ) 1/p is the information or surprise and log 2 (1/p) = # of bits required
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Information Theory 4 Log-probabilities and their sums represent measures of information . Conversely, information can be thought of as log-probabilities (with the negative sign to make the information increase with increasing values) H(p) = log 2 (1/p ) = - log 2 p
Background image of page 6
Information Theory 5 If we have an i.i.d. with 6 values (a die), or 4 (A, C, T, G) or n values (the distribution is flat) then the probability of any particular symbol is 1/n and the information in any such symbol is then log 2 n and this value is also the average If the symbols are not equally probable (not i.d.) we need to weigh the information of each symbol by its probability of occurring. This is Claude Shannon’s Entropy H = Σ p i log 2 (1/p i ) = - Σ p i log 2 p i
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Information Theory 6 If we have a coin, assuming h and t have equal probabilities. H = - ( (1/2) log 2 (1/2) + (1/2) log 2 (1/2) ) = - ( (1/2) (-1) + (1/2) (-1) ) = - ( -1 ) = 1 bit If the coin comes up heads ¾ of the time then the entropy should decrease (we’re more certain of the outcome and there’s less surprise) H = - ( (3/4) log 2 (3/4) + (1/4) log 2 (1/4) ) = - ( (0.75) (-0.415) + (0.25) (-2) ) = - ( -0.81 ) = 0.81 bits
Background image of page 8
Information Theory 7 A random DNA source has an entropy of H = - ( (1/4) log 2 (1/4) + (1/4) log 2 (1/4) + (1/4) log 2 (1/4) + (1/4) log 2 (1/4) ) = - ( (1/4) (-2) + (1/4) (-2) + (1/4) (-2) + (1/4) (-2) ) = - ( -2 ) = 2 bits A DNA source that emits 45% A and T and 5% G and C has an entropy of H = - ( 2*(0.45) log 2 (0.45) + 2*(0.05) log 2 (0.05) ) = - ( (0.90) (-1.15) + (0.10) (-4.32) ) = - ( - 1.035 – 0.432 ) = 1.467 bits
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Natural Logs Using natural logarithms, the information is expressed in units of nats (a contraction of Na tural digi ts ). Bits and nats are easily convertible as follows nats = bits · ln (2) ln 2 or log 2
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 112

91.510_ch06 - Information Theory, Statistical Measures and...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online