Unformatted text preview: The Spectral Test for Randomness LIGN 17 Feb 25, 2011 Recall • Lossless vs. Lossy Compression – Lossless – we can recover all of the data in the original message, even though we made it smaller • Eg. Huﬀman Coding – Lossy – we cannot recover all of the data in the original message • We threw away some bits because the human perceptual system doesn’t need them anyway • Eg. MP3, JPEG formats Compression ArRfacts • Compression ArRfacts happen because of overcompression – JPEG: “busy” parts of the image get pixellated/
blurred – MP3: random noise like clapping gets a Rnny ring • Why? – Randomness is unpredictable – contains more informaRon than we can compress out of it That’s Random… • What’s going to be problemaRc about compressing randomness? • Consider a stereotypical random event: coin
ﬂipping – If we had a sequence of 1000 coin ﬂips, and we saw 12 heads in a row, would that be random? • It turns out that the odds of NOT seeing such a streak during that whole sequence are less than 1 in 11 – Does that pa`ern, or any other pa`ern show up more or less frequently than it “should”? – How would you know if a coin were fair? What would the frequency table look like? Random. Outcome Head Tail Frequency 50 50 • Can we Huﬀman Code this? Will it result in compression? – Not if all we’re considering are Heads and Tails, you might say… 50
50 • Here’s a 50
50 run of coin ﬂips: – HHHHHHHHHHTTTTTTTTTT • Do you believe for a second that that’s random? • Is 50
50 good enough to determine randomness? An Unfair Coin Gary just got this coin oﬀ of Think Geek. It’s pre`y sweet. Most unfair coins are weighted – they come up heads more ocen than they come up tails. Gary’s is more subtle – it comes up in a predictable pa`ern of heads and tails that repeats in groups of eight: HHTHTHTT It’s a long enough sequence that the Ling Grads are going bankrupt paying out his bar
bet stakes, which are “two pints of Stella Artois and a package of cheese and onion crisps,” because they don’t noRce that there’s a repeaRng pa`ern. It looks random enough to the casual observer. How can we prove that his coin is unfair? Gary’s Winnings The Spectral Test for Randomness • Looks at distribuRons over (progressively longer) sequences of Heads and Tails (or, 0’s and 1’s) to see if they could have arisen by chance. • Level 1 Spectral Test: Sequences of length 1: For Gary’s Coin HHTHTHTT – Possible Outcomes: H, T • Frequency of H: 4 • Frequency of T: 4 – Probably could have arisen by chance… – What if it was H:3, T:5? • Not really a long enough sequence to tell… • If it was H:300, T:500, then we’d be pre`y sure it wasn’t random • H:999,782 :: T:1,000,218 0.01% deviance from expected distribuRon close enough to random The Spectral Test for Randomness • Passing Level 1 is no guarantee of randomness: HHHHTTTT passes easily. • Level 2: Sequences of Length 2: – Four outcomes possible: HH, HT, TT, TH For Gary’s Coin (HHTHTHTT): HH:1 HT:3 TH:3 TT:1 Note that these add up to 8 – we wrap around at the end. Could this have arisen by chance? Possibly. But if we had three or four of these sequences, we’d be reasonably conﬁdent that this wasn’t random. The Spectral Test for Randomness • Bitstring: 01010100111011110000 • Level 1: Ten 1’s, Ten 0’s. • Level 2: – 00: 5 – 01: 5 – 10: 5 – 11: 5 • Level 3: lec as an exercise for the student (although this string is a li`le short for this test to be meaningful – why?) The Spectral Test for Randomness Which of the following two bitstrings is random? Bitstring 1: 001001010011001011001001011011001001001010000
001011000011000010011010010000001001010011000
010001011011011001011011010011000011000010010
000011000011010010011011011001011011000011010
00101100101101001000!
Bitstring 2: 011111011100011000000011001101001000011100010
111001100001010100001011111110110001111101000
001000000010111001001100100101010100011011110
111011101010110111100100011110110001011101000
001111001001010110100! Bitstring 2 Was Actually Randomly Generated. What does the Spectral Test give us? Level 1: (2 outcomes) Number of 0’s: 100 Number of 1’s: 100 Level 2: (4 outcomes) 00’s: 52 01’s: 48 10’s: 48 11’s: 52 Level 3: (8 outcomes) 000’s: 28 001’s: 22 010’s: 24 011’s: 24 100’s: 23 101’s: 25 110’s: 24 111’s: 28 How many would we have to consider for Level 4? 5? 6? Bitstring 1 •
Generated using algorithm: Every 3rd bit is a 0; every 5th bit is a 1; every 7th bit is a 1 Level 1: 0’s: 119 1’s: 81 this is not looking good…. Level 2: 00’s: 62 01’s: 56 10’s: 56 11’s: 25 Level 3: 000’s: 28 001’s: 33 010’s: 31 011’s: 25 100’s: 33 101’s: 23 110’s: 25 111’s: 0 This is a dead give
away. Million Digit Random Bitstring Level 1 Frequency Level 3 Frequency 0 499654 000 124854 1 500346 001 125143 010 125107 Level 2 00 249686 011 124978 01 249968 100 125143 10 249968 101 124945 11 250378 110 124978 111 124852 Million Digit Non
random Bitstring Level 1 Frequency Level 3 Frequency 0 561962 000 106819 1 438038 001 157315 010 157616 Level 2 00 264134 011 140211 01 297827 100 157314 10 297827 101 140512 11 140211 110 140211 111 0 (SERIOUSLY?!) The Bo`om Line • A truly random sequence should pass the Spectral Test at ALL levels – Caveat: the bitstring needs to be long enough to actually get signiﬁcant numbers for each outcome • You wouldn’t want to do a Level 10 Spectral Test on a 500 bit sequence – why? – Level 10 will consider 1024 possible outcomes – but you’ll only be looking at 500 sequences of length 10. Compression and Randomness • If I wrote a program to generate random bits and write that to a ﬁle, could I then compress that ﬁle? • Why not? • Frequency analysis won’t help us – everything will occur with roughly the same frequency, so we can’t very well give smaller representaRons to the more frequent strings – And this will carry through to ANY length of sequence! • Compression – squeezing out the predictability – There’s nothing predictable about randomness! Have a great weekend! • (But see you in secRon?) ...
View
Full Document
 Winter '08
 Kehler
 0.01%, a00, Gary’s00

Click to edit the document details