This is an archive of the discontinued LLVM Phabricator instance.

Cap vector alignment at 16 for all Darwin platforms
ClosedPublic

Authored by rjmccall on Apr 24 2018, 9:17 PM.

Details

Summary

This fixes two major problems:

  • We were not capping vector alignment as desired on 32-bit ARM.
  • We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects.

Some of our discussion leading into this change is here: https://github.com/apple/swift/pull/15691

Diff Detail

Repository
rC Clang

Event Timeline

rjmccall created this revision.Apr 24 2018, 9:17 PM
rjmccall edited the summary of this revision. (Show Details)Apr 24 2018, 9:18 PM
scanon added a subscriber: scanon.Apr 25 2018, 7:28 AM
ahatanak accepted this revision.May 2 2018, 3:17 PM
ahatanak added a subscriber: ahatanak.

LGTM

This revision is now accepted and ready to land.May 2 2018, 3:17 PM

Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects.

I was wondering whether there is a warning clang issues when the aligned attribute is stripped. If it doesn't warn, should it? I recently came across a case where a 16-byte vector annotated with a 4-byte alignment was passed to std::swap, which caused a crash because the alignment was stripped and the x86 backend decided to emit an 16-byte aligned load to load an unaligned vector.

I think we should seriously consider making alignment attributes on typedefs (and maybe some other attributes like may_alias) actual type qualifiers that are preserved in the canonical type, mangled, and so on. It would be an ABI break, but it'd also solve a lot of problems.

ab added a comment.May 4 2018, 10:45 AM

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

In D46042#1088044, @ab wrote:

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

Swift's calling conventions (will?) always divide larger vectors into 128b pieces. When interacting with C conventions, yes, this is still an issue.

In D46042#1088044, @ab wrote:

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

I'd forgotten about that. I think there's a strong argument that we're required to pass at least the Intel intrinsic vector types that way, yeah. But if we want a stable ABI for other vector types, we really can't. The root problem here is that the Intel ABI seems to imagine that these vector types only exist when they're supported directly by hardware. (And the Intel intrinsic headers do define those types even when AVX is disabled!) So I don't know that we can make a good ABI story for that.

In D46042#1088044, @ab wrote:

So, this makes sense to me, but on x86, should we also be worried about the fact that the calling convention is based on which features are available? (>128bit ext_vector_types are passed in AVX/AVX-512 registers, if available). Presumably swift is also affected, no?

Swift's calling conventions (will?) always divide larger vectors into 128b pieces. When interacting with C conventions, yes, this is still an issue.

Right, this is just a C ABI issue.

rjmccall closed this revision.Jun 1 2018, 3:01 PM

Landed as r333791.

rnk added a subscriber: rnk.Jun 4 2018, 2:20 PM

This change appears to have caused some blink vector math unit tests to fail on Windows. We are tracking it at https://crbug.com/849251.

It has a pretty small reproducer:

#include <immintrin.h>
__m256 loadit(__m256 *p) { return _mm256_loadu_ps((const float *)p); }

Compile for x86_64-windows-msvc with -mavx, and before this change we got this IR: %0 = load <8 x float>, <8 x float>* %p, align 1
After we get this IR: %0 = load <8 x float>, <8 x float>* %p, align 32

This is surprising. I'll keep debugging.

rnk added a comment.Jun 4 2018, 2:33 PM

It's the typedef alignment changes that are causing problems for us, not the MaxVectorAlign changes. That makes more sense. The new alignment attribute breaks our implementation of _mm256_loadu_ps, because the packed struct ends up with a 32-byte alignment. Here's the implementation:

static __inline __m256 __DEFAULT_FN_ATTRS
_mm256_loadu_ps(float const *__p)
{
  struct __loadu_ps {
    __m256 __v;
  } __attribute__((__packed__, __may_alias__));
  return ((struct __loadu_ps*)__p)->__v;
}

And clang's -fdump-record-layouts says:

*** Dumping AST Record Layout
         0 | struct __loadu_ps
         0 |   __m256 __v
           | [sizeof=32, align=32]

I think the problem is that __attribute__((aligned(N))) beats __attribute__((packed)) on Windows to match MSVC's behavior with __declspec(align(N)).

I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems.

In D46042#1121648, @rnk wrote:

It's the typedef alignment changes that are causing problems for us, not the MaxVectorAlign changes. That makes more sense. The new alignment attribute breaks our implementation of _mm256_loadu_ps, because the packed struct ends up with a 32-byte alignment. Here's the implementation:

static __inline __m256 __DEFAULT_FN_ATTRS
_mm256_loadu_ps(float const *__p)
{
  struct __loadu_ps {
    __m256 __v;
  } __attribute__((__packed__, __may_alias__));
  return ((struct __loadu_ps*)__p)->__v;
}

And clang's -fdump-record-layouts says:

*** Dumping AST Record Layout
         0 | struct __loadu_ps
         0 |   __m256 __v
           | [sizeof=32, align=32]

I think the problem is that __attribute__((aligned(N))) beats __attribute__((packed)) on Windows to match MSVC's behavior with __declspec(align(N)).

I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems.

Ugh. That is just an awful language rule. Would it be reasonable to restrict it to only attributes spelled with __declspec(align(N)) rather than __attribute__((aligned(N))), or is that too invasive in the alignment computation?

rnk added a comment.Jun 4 2018, 4:05 PM

I think we should revert this for now. Adding the alignment attribute to all Intel vector typedefs is a bigger change than it seems.

Ugh. That is just an awful language rule. Would it be reasonable to restrict it to only attributes spelled with __declspec(align(N)) rather than __attribute__((aligned(N))), or is that too invasive in the alignment computation?

When we were working on the record layout code, I didn't want to do that because users often structure their portability headers to check for __clang__ first because clang also defines _MSC_VER and __GNUC__. I felt it would be best if the alignment attributes were as interchangeable as possible. They are very common.

Maybe checking the spelling of the packing attribute would work better. The GCC __attribute__ spelling would ignore what we called "required alignment", meaning alignment required by explicit attributes and not the normal alignof.

rnk added a comment.Jun 4 2018, 4:16 PM

By the way, I went ahead and reverted this in r333958.