With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows.
It appears that MSVC and headers consider the m128/m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here.
This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER.
I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use attribute(packed). So the explicitly aligned types wouldn't produce align 1 accesses when targeting Windows.
I guess we just missed this.