The compiler is allowed to assume that a "uint64_t *" is aligned correctly, and will inline a version of memcpy that acts as such. Use "uint8_t *", so the compiler does the right thing.