The 64-bit load and store code was generating pretty bad output with
my compiler, so I extracted the code from csiphash and used that instead.
Close ticket 21737
The compiler is allowed to assume that a "uint64_t *" is aligned
correctly, and will inline a version of memcpy that acts as such.
Use "uint8_t *", so the compiler does the right thing.
In digestmap_set/get benchmarks, doing unaligned access on x86
doesn't save more than a percent or so in the fast case. In the
slow case (where we cross a cache line), it could be pretty
expensive. It also makes ubsan unhappy.
siphash is a hash function designed for producing hard-to-predict
64-bit outputs from short inputs and a 128-bit key. It's chosen for
security and speed.
See https://131002.net/siphash/ for more information on siphash.
Source: https://github.com/majek/csiphash/