When a 128-bit load/store is aligned by 8, we incorrectly emit load i16, ptr ..., align 2
while the shadow memory address may not be aligned by 2.
This manifests as possibly-misaligned shadow memory load with -mstrict-align,
e.g. clang --target=aarch64-linux -O2 -mstrict-align -fsanitize=address
__attribute__((noinline)) void foo(unsigned long *ptr) { ptr[0] = 3; ptr[1] = 3; } // ldrh w8, [x9, x8] // the shadow memory load may not be aligned by 2
Infer the shadow memory alignment from the load/store alignment to set the
correct alignment. The generated code now uses two ldrb and one orr.