Development Plan for Boost.Unordered

Background

Boost.Unordered implements the TR1/C++11 unordered containers as proposed by Matt Austern in N1456. Section III.B in the paper explains why closed addressing is assumed.

The arguments there were valid at the time the paper was written (2004), but the state of the art has advanced since then, and the hash tables currently in use have chosen open addressing.

In addition, the standard specification of the unordered containers is too limiting even for the closed addressing case, and a slight relaxation of the signature of erase and the complexity requirements of the iterators allows more implementation freedom; notably, it would allow the textbook closed addressing hash table that consists of a bucket array holding singly linked lists.

Therefore, there is room for adding more containers to Unordered that implement these additional hash table variations.

Goals and Non-Goals

Our primary aim will be competitive performance in each category, as measured on the synthetic benchmarks for both integral keys (uint32.cpp and uint64.cpp) and string keys (string.cpp) using the FNV-1a hash function in order to level the playing field and eliminate platform-specific variations in the performance of the default hash function.

Competitive performance for strings using the default hash function will be a secondary goal.

We will not

Implement a custom default allocator. First, the allocator is easily replaced by the user; second, our goal is competitive performance when comparing like with like, e.g. boost::unordered_map against std::unordered_map with both using the default std::allocator.
Change the default hash function. The default will remain boost::hash. boost::hash is more full-featured than std::hash, it does not vary from platform to platform, is easily replaceable, and improving its performance will be done in ContainerHash rather than in Unordered.
Use platform-specific features that change the observable behavior of the containers (improving performance without observable changes by using e.g. SSE2 is fine.) The fact that Unordered can be tested on one platform and then be relied upon to deterministically perform in the same way on another (subject to std::size_t being the same size) is an important benefit that we have over std::unordered_*.
Provide more than one container per category. We will experiment will several possibilities, but at the end we must settle on one.
Change the requirements placed on the provided hash function. In particular, we will not require uniform distribution over the entire range of std::size_t, or that all the bits of the hash function are dependent on the key or are of the same quality. As explained in the documentation of boost::hash, this is not a requirement the standard imposes on the Hash function object or on specializations of std::hash.

(This implies that we will not use hash to bucket policies that only take the low bits, such as H mod 2^N, or that ignore the low bits, such as Lemire’s fastrange. The uint32.cpp and uint64.cpp benchmarks deliberately include tests that have pathological behavior with such bucket distributions. Our hash to bucket mapping should be dependent on all the bits of the hash value.)

Container Variations

We will provide the following containers (only _map is discussed for brevity but everything applies to _set, _multiset, _multimap):

unordered_map. Remains a conforming implementation of the std::unordered_map specification. At the moment our performance suffers because we need to perform a double indirection on lookup to reach the element, so we lose against both std::unordered_map and boost::multi_index. We need to fix this by either using a doubly linked list for the elements, or a doubly linked list for the buckets, as in Joaquín M López Muñoz’s prototype.
unordered_bucket_map. This will implement an almost-standard-conforming interface, with erase(iterator) returning void (as per N2023), and iteration from begin() to end() taking O(bucket_count()+size()) instead of O(size()). This allows a straightforward implementation where the bucket array holds singly linked lists, without any additional bucket or element links.
unordered_flat_map. Open addressing container in which the elements are held directly in the bucket array, rather than in separately allocated nodes. This strongly deviates from the standard API, in that pointers and references to elements are invalidated on rehash, iteration is O(capacity()), there are no local bucket iterators or bucket-oriented queries, and there is no node-based API. We should probably not provide control over the load factor, to allow implementation freedom, as load factors are highly specific to the underlying implementation, and we want to be able to switch it in a future release if needed.

The performance target here will be absl::flat_hash_map.
unordered_node_map. Same as unordered_flat_map, but holds pointers to the elements rather than the elements themselves in the bucket array. The difference with unordered_flat_map is pointer and reference stability. The performance target will be absl::node_hash_map.

Note

It’s possible that some of these containers may turn out to be of limited utility due to being dominated by another container that provides a superset of the API. This may happen, for instance, if unordered_map is as fast as unordered_bucket_map, or if unordered_bucket_map is as fast or faster than unordered_node_map. If this proves to be the case, we’ll need to decide whether to include the superseded container or not, taking differences in memory consumption into account.

Implementation Notes

There are several alternatives we can pursue when implementing unordered_flat_map:

Since at the moment there is no clear winner, we will probably need to implement all of these and compare their performance characteristics. Most of the alternative implementations gravitate towards Robin Hood hashing, but there is a Hopscotch implementation (tsl::hopscotch) that holds its own on benchmarks.

More references:

There are also other, less sophisticated, alternatives we can pursue. We can explore simple linear probing with an upper limit on probes, and reallocate if that’s reached. If the limit is set to a multiple of 16, this can be accomplished by a SSE2 packed compare similar to Abseil’s technique, if we store e.g. hash mod 251 in the byte metadata.

We can also explore a hybrid option where the first element of the per-bucket linked list is stored directly in the bucket array (as in a flat map), but the rest are chained as in a closed addressing map.