Get Started. It's Free
or sign up with your email address
Hash tables by Mind Map: Hash tables

1. Collision resolution: Open addressing

1.1. all entry records are stored in the bucket array itself

1.2. probe sequences

1.2.1. Linear probing

1.2.2. Quadratic probing

1.2.3. Double hashing

1.3. cons

1.3.1. number of stored entries cannot exceed the number of slots in the bucket array

1.3.2. performance dramatically degrades when the load factor grows beyond 0.7 or so

1.3.3. hash function has also minimize the clustering

1.4. pros

1.4.1. avoids the time overhead of allocating each new entry record

1.4.2. saves memory when entries are small particularly suitable for elements of one word or less load factor is not too small

1.5. types

1.5.1. Coalesced hashing A hybrid of chaining and open addressing links together chains of nodes within the table itself

1.5.2. Cuckoo hashing Uses two or more hash functions

1.5.3. Hopscotch hashing combines the approaches of cuckoo hashing and linear probing uses a neighborhood of buckets near the original hashed bucket

1.5.4. Robin Hood hashing variation on double-hashing collision resolution considers probe count

1.5.5. 2-choice hashing employs 2 different hash functions uses location that contains fewer objects

2. Collision resolution: Separate chaining

2.1. each bucket is independent

2.1.1. has list of entries with the same index

2.2. types

2.2.1. Separate chaining with linked lists often implemented as lists ordered by keys if some keys are much more likely to come up than others an unordered list with move-to-front heuristic may be more effective cons not good for small keys and values traversing a linked list has poor cache performance

2.2.2. Separate chaining with list head cells store the first record of each chain in the slot array itself cons empty bucket takes the same space as a bucket with one entry

2.2.3. Separate chaining with other structures worth considering only if the load factor is large (about 10 or more) the hash distribution is likely to be very non-uniform if one must guarantee good performance even in a worst-case scenario self-balancing binary tree dynamic array array hash table= no pointers needed beteter CPU caching dynamic perfect hashing a bucket that contains k entries is organized as a perfect hash table with k2 slots fusion tree

3. Data structure

3.1. used to implement (7)

3.1.1. Associative arrays Database indexing Caches

3.2. uses hash function

3.2.1. compute an index into an array of buckets or slots

3.2.2. should provide a uniform distribution of hash values evaluated empirically using statistical tests a Pearson's chi-squared test for discrete uniform distributions

3.2.3. for open addressing schemes should also avoid clustering clustering is mapping of two or more keys to consecutive slots

3.2.4. perfect hash function has no collisions If all keys are known ahead of time

3.3. load factor

3.3.1. number of entries divided by the number of buckets

3.4. drawbacks (5)

3.4.1. for certain string processing other structures can be better

3.4.2. for small keys arrays are better

3.4.3. poor locality of reference bad for CPU caching

3.4.4. particular operations can be expensive even though on average is fine

3.4.5. malicious adversary can harm performance

4. Dynamic resizing

4.1. serious mistake to not provide any way for the table to resize

4.2. keep the load factor under a certain limit

4.2.1. e.g., under 3/4

4.3. accompanied by a full or incremental table rehash

4.3.1. existing items are mapped to new bucket locations

4.4. types

4.4.1. Resizing by copying all entries

4.4.2. Incremental resizing real-time systems

4.4.3. Monotonic keys key values will always increase (or decrease) monotonically

4.4.4. Consistent hashing