Page MenuHomePhabricator

[X86] Add explicit alignment to __m128/__m128i/__m128d/etc. to allow matching of MSVC behavior with #pragma pack.
ClosedPublic

Authored by craig.topper on Feb 8 2019, 9:54 AM.

Details

Summary

With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows.

It appears that MSVC and headers consider the m128/m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here.

This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER.

I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use attribute(packed). So the explicitly aligned types wouldn't produce align 1 accesses when targeting Windows.

Diff Detail

Repository
rC Clang

Event Timeline

craig.topper created this revision.Feb 8 2019, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2019, 9:54 AM
erichkeane accepted this revision.Feb 8 2019, 10:09 AM

The justification is correct based on my research into this problem, and the code changes themselves look correct. I cannot think of any reason why re-stating the alignment will matter in GCC mode, so I think this is OK.

I'll accept, but I'd really like to give @rnk a chance to take a look to make sure there isn't something I'm missing.

test/CodeGen/x86-vec-struct-packing.c
15

We default to C++14, do we not use static-asserts for this?

This revision is now accepted and ready to land.Feb 8 2019, 10:09 AM

Forgot to do MMX's __m64 type

rnk accepted this revision.Feb 8 2019, 10:37 AM
rnk added a subscriber: rjmccall.

Sounds good.

This change reminded me of D46042, https://crbug.com/849251, and rL333791, which I had to revert, and I don't think it relanded. I think your change to add the *_u typedefs fixes the issues encountered there, and perhaps @rjmccall will be able to reland his change after this lands.

This revision was automatically updated to reflect the committed changes.
rnk added a comment.Feb 12 2019, 11:55 AM

Hm, looks like this broke a bunch of code in chromium:
bad build: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/ToTWin%28dbg%29/2342
good build: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/ToTWin%28dbg%29/2341

A trace from one of the crashing tests:

[ RUN      ] AudioDecoderTestScenarios/AudioDecoderTest.DecodesFramesWithVaryingDuration/3
Received fatal exception EXCEPTION_ACCESS_VIOLATION
Backtrace:
	xcorr_kernel_sse [0x00D4CEF7+199] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\celt\x86\pitch_sse.c:54)
	celt_pitch_xcorr_c [0x00DBC09C+156] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\celt\pitch.c:264)
	_celt_autocorr [0x00DBAF20+368] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\celt\celt_lpc.c:266)
	pitch_downsample [0x00DBBC98+536] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\celt\pitch.c:182)
	run_prefilter [0x00D4093F+703] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\celt\celt_encoder.c:1144)
	celt_encode_with_ec [0x00D3C915+3701] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\celt\celt_encoder.c:1606)
	opus_encode_native [0x00CBBF8A+16810] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\src\opus_encoder.c:2068)
	opus_encode [0x00CBE1C6+374] (C:\b\s\w\ir\cache\builder\src\third_party\opus\src\src\opus_encoder.c:2241)
	media::cast::AudioDecoderTest::FeedMoreAudio [0x0017F0B8+1064] (C:\b\s\w\ir\cache\builder\src\media\cast\receiver\audio_decoder_unittest.cc:111)

The code in question is calling _mm_loadu_ps:
https://cs.chromium.org/chromium/src/third_party/opus/src/celt/x86/pitch_sse.c?type=cs&q=%22opus/src/celt/x86/pitch_sse.c%22&sq=package:chromium&g=0&l=54

for (j = 0; j < len-3; j += 4)
{
   __m128 x0 = _mm_loadu_ps(x+j);
   __m128 yj = _mm_loadu_ps(y+j);
   __m128 y3 = _mm_loadu_ps(y+j+3); // crashes
lib/Headers/xmmintrin.h
1754

I guess we just missed this.

rnk added a comment.Feb 12 2019, 12:01 PM

I audited all the intrin header uses of packed, and I think this was the only one that was missing, so I'll try to fix forward.

Thanks for the fix @rnk, I've added some MSVC command lines to some of the other test files too.