This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] _mm_store1_ps/_mm_store1_pd should require an aligned pointer
ClosedPublic

Authored by RKSimon on May 25 2016, 3:31 AM.

Details

Summary

According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_ps/_mm_store1_pd (and their _mm_store_ps1/_mm_store_pd1 analogues) should require an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_ps/_mm_storeu_pd).

This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead.

I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps).

As a followup I'll update the llvm fast-isel tests to match this codegen.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 58397.May 25 2016, 3:31 AM
RKSimon retitled this revision from to [X86][SSE] _mm_store1_ps/_mm_store1_pd should require an aligned pointer.
RKSimon updated this object.
RKSimon added reviewers: craig.topper, spatel, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: cfe-commits.
craig.topper edited edge metadata.May 25 2016, 8:10 AM
craig.topper added a subscriber: craig.topper.

Can you double check gcc's xmmintrin.h again. I'm pretty sure _mm_store1_ps
is calling _mm_storeu_ps.

Can you double check gcc's xmmintrin.h again. I'm pretty sure _mm_store1_ps
is calling _mm_storeu_ps.

Yes you're right - for gcc _mm_store1_pd is aligned (and there is a comment saying it must be), but _mm_store1_ps is unaligned. The intel intrinsics docs and msvc codegen both set both ps and pd versions to aligned store though.

If you wish I can just do the pd fixes - we are alone in doing a extract + 2*movsd - the rest all use shufpd+movapd

Suggestions for ps?

craig.topper accepted this revision.May 25 2016, 8:19 PM
craig.topper edited edge metadata.

Given that its documented as being aligned. I'm ok with it. LGTM

This revision is now accepted and ready to land.May 25 2016, 8:19 PM
majnemer added inline comments.
lib/Headers/emmintrin.h
598

You could use __attribute__((align_value(16))) no?

RKSimon added inline comments.May 27 2016, 10:05 AM
lib/Headers/emmintrin.h
598

Technically yes but AFAICT there are no other users of this approach in the headers - is it something that we should be encouraging do you think?

Craig - I think you wrote in a commit about dropping the unaligned intrinsics, is that how you'd do it?

This revision was automatically updated to reflect the committed changes.