Dynamic perfect hashing

From Wikipedia, the free encyclopedia

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure.[1][2][3] While more memory-intensive than its hash table counterparts,[citation needed] this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

Static case

FKS Scheme

The problem of optimal static hashing was first solved in general by Fredman, Komlós and Szemerédi.[4] In their 1984 paper,[1] they detail a two-tiered hash table scheme in which each bucket of the (first-level) hash table corresponds to a separate second-level hash table. Keys are hashed twice—the first hash value maps to a certain bucket in the first-level hash table; the second hash value gives the position of that entry in that bucket's second-level hash table. The second-level table is guaranteed to be collision-free (i.e. perfect hashing) upon construction. Consequently, the look-up cost is guaranteed to be O(1) in the worst-case.[2]

In the static case, we are given a set with a total of x entries, each one with a unique key, ahead of time. Fredman, Komlós and Szemerédi pick a first-level hash table with size buckets.[2]

To construct, x entries are separated into s buckets by the top-level hashing function, where . Then for each bucket with k entries, a second-level table is allocated with slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function) and stored alongside the hash table. If the hash function randomly selected creates a table with collisions, a new hash function is randomly selected until a collision-free table can be guaranteed. Finally, with the collision-free hash, the k entries are hashed into the second-level table.

The quadratic size of the space ensures that randomly creating a table with collisions is infrequent and independent of the size of k, providing linear amortized construction time. Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected space, since bucket sizes are small with high probability.[1]

The first-level hash function is specifically chosen so that, for the specific set of x unique key values, the total space T used by all the second-level hash tables has expected space, and more specifically . Fredman, Komlós and Szemerédi showed that given a universal hashing family of hash functions, at least half of those functions have that property.[2]

Dynamic case

Dietzfelbinger et al. present a dynamic dictionary algorithm that, when a set of n items is incrementally added to the dictionary, membership queries always run in constant time and therefore worst-case time, the total storage required is (linear), and expected amortized insertion and deletion time (amortized constant time).

In the dynamic case, when a key is inserted into the hash table, if its entry in its respective subtable is occupied, then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function. Because the load factor of the second-level table is kept low , rebuilding is infrequent, and the amortized expected cost of insertions is .[2] Similarly, the amortized expected cost of deletions is .[2]

Additionally, the ultimate sizes of the top-level table or any of the subtables is unknowable in the dynamic case. One method for maintaining expected space of the table is to prompt a full reconstruction when a sufficient number of insertions and deletions have occurred. By results due to Dietzfelbinger et al.,[2] as long as the total number of insertions or deletions exceeds the number of elements at the time of last construction, the amortized expected cost of insertion and deletion remain with full rehashing taken into consideration.

The implementation of dynamic perfect hashing by Dietzfelbinger et al. uses these concepts, as well as lazy deletion, and is shown in pseudocode below.

Pseudocode implementation

See also

References

Related Articles

Wikiwand AI