2020_High_Performance_Python_-or_Humans_O'Reilly_Media,_Inc. 576.pdf

This preview shows page 1 out of 1 page.

The preview shows page 1 - 1 out of 1 page.
ploys a hashing algorithm to assign token frequencies to columns.We’ve already looked at hash functions in“How Do Dictionaries and SetsWork?”. A hash converts a unique item (in this case a text token) into anumber, where multiple unique items might map to the same hashed val‐ue, in which case we get a collision. Good hash functions cause few colli‐sions. Collisions are inevitable if we’re hashing many unique items to asmaller representation. One feature of a hash function is that it can’t easilybe reversed, so we can’t take a hashed value and convert it back to theoriginal token.InExample 11-24, we ask for a fixed-width 10-column matrix—the de‐fault is a fixed-width matrix of 1 million elements, but we’ll use a tiny ma‐trix here to show a collision. The default 1-million-element width is a sen‐sible default for many applications.
End of preview. Want to read the entire page?

Upload your study docs or become a

Course Hero member to access this document

Term
Spring
Professor
SDesousa
Tags
hash function

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture