hash12 - Advanced File Structures Hashing (12) Introduction...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Introduction For the past several weeks we have studied a variety of advanced data structures whose primary purpose was the representation of data contained in main memory that supported an algorithm during its execution. With the exception of B-trees, in which part of the data was resident in main memory and part was resident in secondary memory, all of these data structures were designed for data representation in main memory. Hashing is a variant of a more general file organization technique called a direct file . A direct file is a variant of an even more general type of file organization known as an indexed file. Indexed files typically consist of two main structures, an index structure and a main structure. Similar to the concept employed in B*-trees and B + -trees with their index set and sequential set. There are many different variations of indexed files, however, they can be broadly categorized into two categories which are based primarily on the density of entries in the index structure compared to the number of entries in the main file. These two primary categories are sparse index files and dense index files. A hash file or direct file falls generally under the category of a dense index file, although it is a very special variant of the dense index file. Hash files themselves are typically categorized in two different manners. The first depends on whether the file structure is resident in main memory or on secondary memory. The former is called internal hashing while the latter is called external hashing. You may have been introduced to internal hashing in CS2 or perhaps in System Software (COP 3402) as the commonly referred to “hash table” or “hash file” is a common data structure used within a compiler or assembler as a method for implementing a symbol table. External hashing is a common database approach for hashing secondary memory (primarily disk-based files). The primary difference between internal hashing and external hashing is that in external hashing the hashing function is tailored to take advantage of the block- based access methods on the disk drive. This allows a single hash function value to “load” into main memory an enormous amount of data in one single disk “fetch” operation, whereas with internal hashing typically either a single value (record) or very small number of values are returned for a single hash value. In this set of notes we’ll give a quick review of common hashing techniques and take a quick look at internal hashing and the problems associated with internal hashing. We’ll examine external hashing and finally we’ll look at hashing Hashing - 1 Advanced File Structures – Hashing (12)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
techniques that allow for dynamic file expansion, something which is not feasible (in terms of time) with internal hashing.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 06/13/2011.

Page1 / 20

hash12 - Advanced File Structures Hashing (12) Introduction...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online