{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# quiz - Quiz time a One of the most important statistical...

This preview shows pages 1–4. Sign up to view the full content.

Quiz time

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
a. One of the most important statistical properties of text is Zipf’s law. Write down Zipf’s law: [5pts] b. Explain Zipf’s law in a sentence or two. [5pts] c. Now we’re going to consider Zipf’s law in action. First, suppose that you have a collection of extremely simple children’s books. There are only four different words in your collection: alice banana chocolate dandelion . There are no other words. Suppose there are 5,000 tokens in your collection and that the frequency order is alice > banana > chocolate > dandelion . Assuming that Zipf’s law holds exactly for this collection, what are the frequencies of the four words? [10pts]
Document 1: “The Longhorns? The Longhorns? No ... Aggies! Aggies! Aggies! Aggies!” Document 2: “Aggies? Aggies? The Aggies will defeat the Longhorns. :-)” Remove all punctuation and perform casefolding (convert to all lower caps). Now write down the modified documents: [4pts] Now, we’re going to view our documents in terms of the vector space model. How many dimensions (axes) are there in the vector space

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 4

quiz - Quiz time a One of the most important statistical...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online