Background
Boost.Unordered implements the TR1/C++11 unordered containers as proposed by Matt Austern in N1456. Section III.B in the paper explains why closed addressing is assumed.
The arguments there were valid at the time the paper was written (2004), but the state of the art has advanced since then, and the hash tables currently in use have chosen open addressing.
In addition, the standard specification of the unordered containers is
too limiting even for the closed addressing case, and a slight relaxation
of the signature of erase
and the complexity requirements of the
iterators allows more implementation freedom; notably, it would allow
the textbook closed addressing hash table that consists of a bucket
array holding singly linked lists.
Therefore, there is room for adding more containers to Unordered that implement these additional hash table variations.
Goals and Non-Goals
Our primary aim will be competitive performance in each category, as
measured on the
synthetic benchmarks
for both integral keys (uint32.cpp
and uint64.cpp
) and string keys
(string.cpp
) using the FNV-1a hash function in order to level the
playing field and eliminate platform-specific variations in the
performance of the default hash function.
Competitive performance for strings using the default hash function will be a secondary goal.
We will not
-
Implement a custom default allocator. First, the allocator is easily replaced by the user; second, our goal is competitive performance when comparing like with like, e.g.
boost::unordered_map
againststd::unordered_map
with both using the defaultstd::allocator
. -
Change the default hash function. The default will remain
boost::hash
.boost::hash
is more full-featured thanstd::hash
, it does not vary from platform to platform, is easily replaceable, and improving its performance will be done in ContainerHash rather than in Unordered. -
Use platform-specific features that change the observable behavior of the containers (improving performance without observable changes by using e.g. SSE2 is fine.) The fact that Unordered can be tested on one platform and then be relied upon to deterministically perform in the same way on another (subject to
std::size_t
being the same size) is an important benefit that we have overstd::unordered_*
. -
Provide more than one container per category. We will experiment will several possibilities, but at the end we must settle on one.
-
Change the requirements placed on the provided hash function. In particular, we will not require uniform distribution over the entire range of
std::size_t
, or that all the bits of the hash function are dependent on the key or are of the same quality. As explained in the documentation ofboost::hash
, this is not a requirement the standard imposes on the Hash function object or on specializations ofstd::hash
.(This implies that we will not use hash to bucket policies that only take the low bits, such as
H mod 2^N
, or that ignore the low bits, such as Lemire’sfastrange
. Theuint32.cpp
anduint64.cpp
benchmarks deliberately include tests that have pathological behavior with such bucket distributions. Our hash to bucket mapping should be dependent on all the bits of the hash value.)
Container Variations
We will provide the following containers (only _map
is discussed for
brevity but everything applies to _set
, _multiset
, _multimap
):
-
unordered_map
. Remains a conforming implementation of thestd::unordered_map
specification. At the moment our performance suffers because we need to perform a double indirection on lookup to reach the element, so we lose against bothstd::unordered_map
andboost::multi_index
. We need to fix this by either using a doubly linked list for the elements, or a doubly linked list for the buckets, as in Joaquín M López Muñoz’s prototype. -
unordered_bucket_map
. This will implement an almost-standard-conforming interface, witherase(iterator)
returningvoid
(as per N2023), and iteration frombegin()
toend()
takingO(bucket_count()+size())
instead ofO(size())
. This allows a straightforward implementation where the bucket array holds singly linked lists, without any additional bucket or element links. -
unordered_flat_map
. Open addressing container in which the elements are held directly in the bucket array, rather than in separately allocated nodes. This strongly deviates from the standard API, in that pointers and references to elements are invalidated on rehash, iteration isO(capacity())
, there are no local bucket iterators or bucket-oriented queries, and there is no node-based API. We should probably not provide control over the load factor, to allow implementation freedom, as load factors are highly specific to the underlying implementation, and we want to be able to switch it in a future release if needed.The performance target here will be
absl::flat_hash_map
. -
unordered_node_map
. Same asunordered_flat_map
, but holds pointers to the elements rather than the elements themselves in the bucket array. The difference withunordered_flat_map
is pointer and reference stability. The performance target will beabsl::node_hash_map
.
Note
|
It’s possible that some of these containers may turn out to be of limited utility
due to being dominated by another container that provides a superset of the API.
This may happen, for instance, if unordered_map is as fast as unordered_bucket_map ,
or if unordered_bucket_map is as fast or faster than unordered_node_map . If this
proves to be the case, we’ll need to decide whether to include the superseded
container or not, taking differences in memory consumption into account.
|
Implementation Notes
There are several alternatives we can pursue when implementing unordered_flat_map
:
Since at the moment there is no clear winner, we will probably need to implement
all of these and compare their performance characteristics. Most of the alternative
implementations gravitate towards Robin Hood hashing, but there is a Hopscotch
implementation (tsl::hopscotch
) that
holds its own on benchmarks.
More references:
There are also other, less sophisticated, alternatives we can pursue. We can explore
simple linear probing with an upper limit on probes, and reallocate if that’s
reached. If the limit is set to a multiple of 16, this can be accomplished by a SSE2
packed compare similar to Abseil’s technique, if we store e.g. hash mod 251
in the
byte metadata.
We can also explore a hybrid option where the first element of the per-bucket linked list is stored directly in the bucket array (as in a flat map), but the rest are chained as in a closed addressing map.