quiz - Quiz time a. One of the most important statistical...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Quiz time
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
a. One of the most important statistical properties of text is Zipf’s law. Write down Zipf’s law: [5pts] b. Explain Zipf’s law in a sentence or two. [5pts] c. Now we’re going to consider Zipf’s law in action. First, suppose that you have a collection of extremely simple children’s books. There are only four different words in your collection: alice banana chocolate dandelion . There are no other words. Suppose there are 5,000 tokens in your collection and that the frequency order is alice > banana > chocolate > dandelion . Assuming that Zipf’s law holds exactly for this collection, what are the frequencies of the four words? [10pts]
Background image of page 2
Document 1: “The Longhorns? The Longhorns? No . .. Aggies! Aggies! Aggies! Aggies!” Document 2: “Aggies? Aggies? The Aggies will defeat the Longhorns. :-)” Remove all punctuation and perform casefolding (convert to all lower caps). Now write down the modiFed documents: [4pts] Now, we’re going to view our documents in terms of the vector space
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/21/2011 for the course CSCP 689 taught by Professor James during the Spring '10 term at Texas A&M.

Page1 / 4

quiz - Quiz time a. One of the most important statistical...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online