J. Andrew Rogers
Pursuit of a Perfect Algorithm

About   Archive

AquaHash: Fast Hashing With AES Intrinsics

AquaHash is a 128-bit non-cryptographic hash function that delivers state-of-the-art performance across all key sizes. The algorithm employs a novel construction based on AES intrinsics. Source code is available under the Apache License.

Performance

The AquaHash algorithm is a seamless composite of two algorithms that deliver exceptional performance for large keys and small keys respectively. The component algorithms are provided separately for reference.

Large Key Algorithm: Bulk hashing at 15 bytes/cycle on Skylake CPUs. Small key performance starts at 25 cycles per hash, ironically making it among the fastest algorithms for smaller keys too.

Small Key Algorithm: Up to twice as fast as the large key algorithm for small keys. This is the fastest extant algorithm for applications like small string hashing. Bulk hashing is 5 bytes/cycle.

Small Key Performance

The performance graph includes xxhash64 and Google’s FarmHash for reference, popular algorithms optimized for small keys and large keys respectively. Some reduction in relative performance is expected on older CPUs with slower AES intrinsics, offset by the large baseline performance advantage.

Design

A primary design goal was to reduce the inherent tradeoff between small key and large key performance. Larger internal block sizes increase bulk hashing throughput by increasing parallelism, but also require an expensive mix down stage to ensure adequate dispersion of input bits across the block and to reduce the block size to the hash size regardless of the key size.

AquaHash’s large key algorithm uses a 512-bit internal block size across four lanes to produce a 128-bit hash, presenting a significant challenge for small key performance. Instead of using direct lane mixing for bit dispersion, which incurs a high cost due to instruction dependencies that are magnified when using high-latency instructions, it employs an indirect lane mixing construction that enables efficient pipelining of cross-lane bit dispersion. This optimization is not free, adversely impacting the design of other parts of the hash function, but the tradeoff is well worth it. While the overhead of this construction is still significant for small keys, it reduces it sufficiently to be highly competitive for small key hashing given the performance of the rest of the algorithm.

The small key algorithm reduces the internal block size from 512-bit to 128-bit, eliminating the computationally expensive cross-lane bit dispersion and block reduction stage entirely. This results in a simple, compact, and elegant algorithm with low performance variance across similar key sizes.

Performance of the two algorithms crosses over when keys that are at least as large as the block size of the large key algorithm. Incremental hashing, which is typically more complex to implement with hybridized algorithms, is correspondingly trivial.

Usage Notes

The AquaHash function is not suitable for every application. On platforms that have no hardware support for AES encryption, other algorithms may be faster at a similar level of quality. Despite being constructed from cryptographic primitives, AquaHash is not suitable for cryptographic use cases. While there are no guarantees as to the strength of any hash function, I would expect these algorithms to be comparatively robust due to use of AES encryption primitives. For certain narrow applications, such as hashing 32-bit or 64-bit values, there are even faster collision-free hashing algorithms that can be constructed from AES intrinsics, such as the ones described here.