Vollbild-Modus

Hash tables

Andere

Oleg Shevelyov

Folgen

Jetzt loslegen. Gratis!

oder registrieren mit Ihrer E-Mail-Adresse

Ähnliche Mindmaps Mindmap-Gliederung

Hash tables von Oleg Shevelyov Mind Map: Hash tables

1. Collision resolution: Separate chaining

1.1. each bucket is independent

1.1.1. has list of entries with the same index

1.2. types

1.2.1. Separate chaining with linked lists

1.2.1.1. often implemented as lists ordered by keys

1.2.1.2. if some keys are much more likely to come up than others

1.2.1.2.1. an unordered list with move-to-front heuristic may be more effective

1.2.1.3. cons

1.2.1.3.1. not good for small keys and values

1.2.1.3.2. traversing a linked list has poor cache performance

1.2.2. Separate chaining with list head cells

1.2.2.1. store the first record of each chain in the slot array itself

1.2.2.2. cons

1.2.2.2.1. empty bucket takes the same space as a bucket with one entry

1.2.3. Separate chaining with other structures

1.2.3.1. worth considering only if

1.2.3.1.1. the load factor is large (about 10 or more)

1.2.3.1.2. the hash distribution is likely to be very non-uniform

1.2.3.1.3. if one must guarantee good performance even in a worst-case scenario

1.2.3.2. self-balancing binary tree

1.2.3.3. dynamic array

1.2.3.3.1. array hash table=

1.2.3.3.2. no pointers needed

1.2.3.3.3. beteter CPU caching

1.2.3.4. dynamic perfect hashing

1.2.3.4.1. a bucket that contains k entries is organized as a perfect hash table with k2 slots

1.2.3.5. fusion tree

2. Dynamic resizing

2.1. serious mistake to not provide any way for the table to resize

2.2. keep the load factor under a certain limit

2.2.1. e.g., under 3/4

2.3. accompanied by a full or incremental table rehash

2.3.1. existing items are mapped to new bucket locations

2.4. types

2.4.1. Resizing by copying all entries

2.4.2. Incremental resizing

2.4.2.1. real-time systems

2.4.3. Monotonic keys

2.4.3.1. key values will always increase (or decrease) monotonically

2.4.4. Consistent hashing

3. Collision resolution: Open addressing

3.1. all entry records are stored in the bucket array itself

3.2. probe sequences

3.2.1. Linear probing

3.2.2. Quadratic probing

3.2.3. Double hashing

3.3. cons

3.3.1. number of stored entries cannot exceed the number of slots in the bucket array

3.3.2. performance dramatically degrades when the load factor grows beyond 0.7 or so

3.3.3. hash function has also minimize the clustering

3.4. pros

3.4.1. avoids the time overhead of allocating each new entry record

3.4.2. saves memory when

3.4.2.1. entries are small

3.4.2.1.1. particularly suitable for elements of one word or less

3.4.2.2. load factor is not too small

3.5. types

3.5.1. Coalesced hashing

3.5.1.1. A hybrid of chaining and open addressing

3.5.1.1.1. links together chains of nodes within the table itself

3.5.2. Cuckoo hashing

3.5.2.1. Uses two or more hash functions

3.5.3. Hopscotch hashing

3.5.3.1. combines the approaches of cuckoo hashing and linear probing

3.5.3.1.1. uses a neighborhood of buckets near the original hashed bucket

3.5.4. Robin Hood hashing

3.5.4.1. variation on double-hashing collision resolution

3.5.4.1.1. considers probe count

3.5.5. 2-choice hashing

3.5.5.1. employs 2 different hash functions

3.5.5.1.1. uses location that contains fewer objects

4. Data structure

4.1. used to implement (7)

4.1.1. Associative arrays

4.1.1.1. Database indexing

4.1.1.1.1. Caches

4.2. uses hash function

4.2.1. compute an index into an array of buckets or slots

4.2.2. should provide a uniform distribution of hash values

4.2.2.1. evaluated empirically using statistical tests

4.2.2.1.1. a Pearson's chi-squared test for discrete uniform distributions

4.2.3. for open addressing schemes should also avoid clustering

4.2.3.1. clustering is mapping of two or more keys to consecutive slots

4.2.4. perfect hash function

4.2.4.1. has no collisions

4.2.4.1.1. If all keys are known ahead of time

4.3. load factor

4.3.1. number of entries divided by the number of buckets

4.4. drawbacks (5)

4.4.1. for certain string processing other structures can be better

4.4.2. for small keys arrays are better

4.4.3. poor locality of reference

4.4.3.1. bad for CPU caching

4.4.4. particular operations can be expensive

4.4.4.1. even though on average is fine

4.4.5. malicious adversary can harm performance

oder Registrieren