This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Fix handling of vectors and their exposure of the vector ABI.
ClosedPublic

Authored by jonpa on Jan 10 2023, 10:33 AM.

Details

Summary
  • Only vectors <=16 bytes expose the vector ABI when passed between functions.
  • Only global vector variables >=16 bytes expose the vector ABI through their alignments.

Test suite builds with gcc/clang now give the same gnu attributes.

Diff Detail

Event Timeline

jonpa created this revision.Jan 10 2023, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 10:33 AM
jonpa requested review of this revision.Jan 10 2023, 10:33 AM
uweigand added inline comments.Jan 11 2023, 1:49 AM
clang/lib/CodeGen/TargetInfo.cpp
7876

I think this isn't quite correct - if Bytes >= 16, the alignment of the vector type is different, which may always be ABI-relevant, even when passed as argument. (E.g. when passing a (pointer to a) struct containing a vector member, the layout of the struct becomes ABI relevant, and that layout depends on member alignment.)

I think this isn't quite correct - if Bytes >= 16, the alignment of the vector type is different, which may always be ABI-relevant, even when passed as argument. (E.g. when passing a (pointer to a) struct containing a vector member, the layout of the struct becomes ABI relevant, and that layout depends on member alignment.)

I guess I am a little confused on this detail as I have been looking at what GCC is doing when passing a vector argument:

typedef __attribute__((vector_size(32))) int v8i32;
void bar(v8i32 Arg);
void foo() {
  v8i32 Var = {0, 0, 0, 0};
  bar(Var);
}

The argument vector does not fit in a register, so it's passed in memory. Following the general vector alignment rules, with the SW ABI it should be aligned to 32 bytes, and with the HW ABI aligned to 8 bytes, on the stack. However, what I see with gcc is that in both cases there is only the 8 byte alignment:

lay     %r15,-192(%r15)
mvghi   160(%r15),0
mvghi   168(%r15),0
mvghi   176(%r15),0
mvghi   184(%r15),0
la      %r2,160(%r15)
brasl   %r14,bar@PLT

Clang, on the other hand does a runtime alignment of 32 bytes for *both* ABIs:

aghi    %r15, -160
aghik   %r1, %r15, -56
la      %r2, 184(%r1)
nill    %r2, 65504
lgr     %r15, %r1
vgbm    %v0, 0
vst     %v0, 16(%r2), 4
vst     %v0, 0(%r2), 4
brasl   %r14, bar@PLT

Not sure what I am missing, but it seems to me that GCC is missing the SW ABI alignment of 32 (bug), and clang is over-aligning for the HW ABI (performance bug). I guess I would like to think that the alignment of a vector per the ABI is the same whether it's an argument on the stack or a global variable, or?

To summarize: vectors >= 16 always expose the ABI and vectors <= 8 do not except when passed in registers between functions, right?

I was actually thinking of the case

struct arg
  {
     char x;
     v8i32 y;
  };

foo (struct arg arg);

Here there definitely is a difference in actual generated code both on GCC and LLVM between the two ABIs, since the offset of arg.y in arg is different.

Now a different question is the alignment of a simple v8i32 passed via implicit reference. Here, we should be passing a pointer to v8i32, so arguably GCC does indeed have a bug if that pointer doesn't have the natural alignment for its type. This probably doesn't matter however, as that implicit pointer is hidden and cannot be inspected by user code, so it would be impossible to actually detect any misalignment ...

jonpa updated this revision to Diff 488676.Jan 12 2023, 8:51 AM

Now a different question is the alignment of a simple v8i32 passed via implicit reference. Here, we should be passing a pointer to v8i32, so arguably GCC does indeed have a bug if that pointer doesn't have the natural alignment for its type. This probably doesn't matter however, as that implicit pointer is hidden and cannot be inspected by user code, so it would be impossible to actually detect any misalignment ...

ok - that's more clear to me now. Patch updated.

jonpa marked an inline comment as done.Jan 12 2023, 8:51 AM
jonpa updated this revision to Diff 489831.Jan 17 2023, 8:19 AM

Patch updated to also handle

  • single element structs passed in vector registers. SystemZABIInfo::GetSingleElementType() is used for this, which now needs an extra check to avoid crashing on empty records.
  • vector varargs < 16 bytes. All of these are indeed passed on the stack (per the comment), but with the old ABI this is a stack reference (address passed in GPR), while the new ABI simply puts them on the stack (address not passed). So these are also different regardless of the vector size.
uweigand added inline comments.Jan 20 2023, 5:50 AM
clang/lib/CodeGen/TargetInfo.cpp
7437

I don't really like the default value here - this is such a critical decision that each caller should make it explicitly.

7455

Likewise.

7868

Stripping off pointer types is not quite correct in the IsParam case. For example, if you have an argument that is a pointer to an 8-byte vector, this does *not* affect the ABI, but the current code would return true.

The IsParam case should probably be handled right at the top.

jonpa updated this revision to Diff 492457.Jan 26 2023, 8:16 AM
jonpa marked 3 inline comments as done.

Review addressed - hopefully this time with a better result.

I think the patch now does what I want it to, while I found some test cases while checking against GCC which I am a little unsure of:

(using GCC 12.2.1, 20220901)

GCC does *not* emit a gnu attribute for this (A wide vector should have different alignments):

typedef __attribute__((vector_size(32))) int v8i32;
void GlobFun(v8i32 (*f)(v8i32));
static v8i32 foo(v8i32 Arg) {
  return Arg;
}
void fun() {
  GlobFun(foo);
}

GCC *does* emit a gnu attribute for this (Since the arguments are passed via address and the vectors are only 8 bytes, there should be no visible ABI, or?):

typedef __attribute__((vector_size(8))) int v2i32;
void bar(v2i32 VArr[4], v2i32 *Dst) { *Dst = VArr[3]; }

GCC does *not* emit it for this, but I don't understand why as the vectors are 32 bytes in size (with v4i32 the attribute is emitted as I expected):

typedef __attribute__((vector_size(32))) int v8i32;
void bar(v8i32 VArr[2], v8i32 *Dst) { *Dst = VArr[1]; }

GCC does *not* emit the attribute for this:

v8i32 bar(v8i32 Arg) { return Arg; }

GCC does *not* emit the attribute for a wide vector vararg argument:

void bar(int N, ...);
void foo() {
  v8i32 Var = {1, 2, 3, 4, 5, 6, 7, 8};
  bar(1, Var);
}

Both compilers *emit* the attribute for this:

struct SingleElStruct {
  v8i32 B;
};
struct SingleElStruct S;

However it seems that while gcc emits the alignments 8/32 for z13/z10, clang emits the big alignment also for z13, which seems wrong to me.

Review addressed - hopefully this time with a better result.

I think the patch now does what I want it to, while I found some test cases while checking against GCC which I am a little unsure of:

(using GCC 12.2.1, 20220901)

@Andreas-Krebbel any comments on this?

GCC does *not* emit a gnu attribute for this (A wide vector should have different alignments):

typedef __attribute__((vector_size(32))) int v8i32;
void GlobFun(v8i32 (*f)(v8i32));
static v8i32 foo(v8i32 Arg) {
  return Arg;
}
void fun() {
  GlobFun(foo);
}

This is somewhat unclear - these are passed via "hidden" pointers that cannot be discovered by user code, and I'm not sure the alignment requirements for "normal" pointer apply to those. The current ABI doc says:

Replace such an argument by a pointer to the object, or to a copy where necessary to enforce call-by-value semantics.

While not explicitly stated, it would indeed be reasonable to read into that an alignment requirement, i.e. that the pointer has the same requirement as a "normal" pointer to the object. But in the new ABI, that requirement is explicitly stated as 8 bytes.

The old behavior is not explicitly specified by any ABI doc, so we can really only go after what GCC actually used to do. And in fact GCC used to (and still does!) simply place such arguments on the stack and passes a hidden pointer to the stack slot - without any dynamic stack realignment, so it in fact only guarantees 8 byte alignment. So I think it is *safe* to assume that using 8 bytes always is OK - and in that sense it is safe to omit the GNU attribute.

On the other hand, LLVM does actually align these to 32 bytes, so for LLVM the vector support feature does actually constitute a change in the de-facto ABI. (Still, LLVM-generated code doesn't appear to *rely* on the extended alignment in compiler-generated code anywhere - and these pointers are not discoverable by user code, so alignment cannot make any difference there either. So it should still be safe.)

GCC *does* emit a gnu attribute for this (Since the arguments are passed via address and the vectors are only 8 bytes, there should be no visible ABI, or?):

typedef __attribute__((vector_size(8))) int v2i32;
void bar(v2i32 VArr[4], v2i32 *Dst) { *Dst = VArr[3]; }

I agree, this seems unnecessary.

GCC does *not* emit it for this, but I don't understand why as the vectors are 32 bytes in size (with v4i32 the attribute is emitted as I expected):

typedef __attribute__((vector_size(32))) int v8i32;
void bar(v8i32 VArr[2], v8i32 *Dst) { *Dst = VArr[1]; }

That seems a bug, the difference in alignment is definitely visible to user code here.

GCC does *not* emit the attribute for this:

v8i32 bar(v8i32 Arg) { return Arg; }

That's the same as above (hidden pointers), right?

GCC does *not* emit the attribute for a wide vector vararg argument:

void bar(int N, ...);
void foo() {
  v8i32 Var = {1, 2, 3, 4, 5, 6, 7, 8};
  bar(1, Var);
}

This is again the same situation (hidden pointer).

Both compilers *emit* the attribute for this:

struct SingleElStruct {
  v8i32 B;
};
struct SingleElStruct S;

However it seems that while gcc emits the alignments 8/32 for z13/z10, clang emits the big alignment also for z13, which seems wrong to me.

Huh, I cannot reproduce this - clang with -march=z13 does use 8 byte alignment in my test.

jonpa added a comment.Jan 26 2023, 9:53 AM

I see this with latest clang:

/bin/clang -O3 -o - -S -c ./test.c -march=z10
        .text
        .file   "test.c"
        .type   S,@object                       # @S
        .section        .bss,"aw",@nobits
        .globl  S
        .p2align        5, 0x0
S:
        .space  32
        .size   S, 32

        .ident  "clang version 17.0.0 (https://github.com/llvm/llvm-project.git 77f4c91c5e052f43245555fffbb0649191e3e32f)"
        .section        ".note.GNU-stack","",@progbits
        .addrsig
        .gnu_attribute 8, 1

./bin/clang -O3 -o - -S -c ./test.c -march=z13
        .text
        .file   "test.c"
        .type   S,@object                       # @S
        .section        .bss,"aw",@nobits
        .globl  S
        .p2align        5, 0x0
S:
        .space  32
        .size   S, 32

        .ident  "clang version 17.0.0 (https://github.com/llvm/llvm-project.git 77f4c91c5e052f43245555fffbb0649191e3e32f)"
        .section        ".note.GNU-stack","",@progbits
        .addrsig
        .gnu_attribute 8, 2

Well, with this test case:

typedef __attribute__((vector_size(32))) int v8i32;

int main (void)
{
        printf ("align: %ld\n", __alignof__(v8i32));
}

I see the correct result (8 or 32 depending on march).

Not sure why it chooses the palign, but that's really just an implementation choice (you may always overalign) that's not visible to others.

The old behavior is not explicitly specified by any ABI doc, so we can really only go after what GCC actually used to do. And in fact GCC used to (and still does!) simply place such arguments on the stack and passes a hidden pointer to the stack slot - without any dynamic stack realignment, so it in fact only guarantees 8 byte alignment. So I think it is *safe* to assume that using 8 bytes always is OK - and in that sense it is safe to omit the GNU attribute.

OK - I have been assuming that *all* vectors in memory should be aligned the same, but if that's not necessarily true for wide vector arguments that are forced to the stack, I guess that explains this. Should I update the patch to do just like GCC in these cases (and not emit the attribute)?

The old behavior is not explicitly specified by any ABI doc, so we can really only go after what GCC actually used to do. And in fact GCC used to (and still does!) simply place such arguments on the stack and passes a hidden pointer to the stack slot - without any dynamic stack realignment, so it in fact only guarantees 8 byte alignment. So I think it is *safe* to assume that using 8 bytes always is OK - and in that sense it is safe to omit the GNU attribute.

OK - I have been assuming that *all* vectors in memory should be aligned the same, but if that's not necessarily true for wide vector arguments that are forced to the stack, I guess that explains this. Should I update the patch to do just like GCC in these cases (and not emit the attribute)?

Yes, it's probably best to match GCC behavior here.

jonpa updated this revision to Diff 492726.Jan 27 2023, 6:28 AM

Yes, it's probably best to match GCC behavior here.

Patch updated to do as GCC.

This worked as expected with vector args, but however with single element structs GCC emits the attribute also with wide vectors like in this test case:

struct SingleElStruct {
  v8i32 B;
};
void bar(struct SingleElStruct Arg);
void foo() {
  struct SingleElStruct Var = {{0}};
  bar(Var);
}

There is no difference with GCC between z10/z13 here, so this may not be actually necessary as far as I can see.

(Patch now matches GCC behavior also in the single element struct case.)

There is no difference with GCC between z10/z13 here, so this may not be actually necessary as far as I can see.

(Patch now matches GCC behavior also in the single element struct case.)

In this case I think it would be preferable to omit the attribute - it is completely safe to do so, even if it doesn't match GCC's behavior; in fact, we'd rather change GCC to match as well.

jonpa updated this revision to Diff 492794.Jan 27 2023, 9:02 AM

In this case I think it would be preferable to omit the attribute - it is completely safe to do so, even if it doesn't match GCC's behavior; in fact, we'd rather change GCC to match as well.

Ok - patch updated.

uweigand accepted this revision.Jan 27 2023, 9:27 AM

Thanks! Minor commit nit inline, otherwise this now LGTM.

clang/lib/CodeGen/TargetInfo.cpp
7874

Comment is outdated - there *is* a check for size now.

This revision is now accepted and ready to land.Jan 27 2023, 9:27 AM
This revision was landed with ongoing or failed builds.Jan 27 2023, 11:24 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 11:24 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Review addressed - hopefully this time with a better result.

I think the patch now does what I want it to, while I found some test cases while checking against GCC which I am a little unsure of:

(using GCC 12.2.1, 20220901)

@Andreas-Krebbel any comments on this?

GCC does *not* emit a gnu attribute for this (A wide vector should have different alignments):

typedef __attribute__((vector_size(32))) int v8i32;
void GlobFun(v8i32 (*f)(v8i32));
static v8i32 foo(v8i32 Arg) {
  return Arg;
}
void fun() {
  GlobFun(foo);
}

This is somewhat unclear - these are passed via "hidden" pointers that cannot be discovered by user code, and I'm not sure the alignment requirements for "normal" pointer apply to those. The current ABI doc says:

Replace such an argument by a pointer to the object, or to a copy where necessary to enforce call-by-value semantics.

While not explicitly stated, it would indeed be reasonable to read into that an alignment requirement, i.e. that the pointer has the same requirement as a "normal" pointer to the object. But in the new ABI, that requirement is explicitly stated as 8 bytes.

The old behavior is not explicitly specified by any ABI doc, so we can really only go after what GCC actually used to do. And in fact GCC used to (and still does!) simply place such arguments on the stack and passes a hidden pointer to the stack slot - without any dynamic stack realignment, so it in fact only guarantees 8 byte alignment. So I think it is *safe* to assume that using 8 bytes always is OK - and in that sense it is safe to omit the GNU attribute.

On the other hand, LLVM does actually align these to 32 bytes, so for LLVM the vector support feature does actually constitute a change in the de-facto ABI. (Still, LLVM-generated code doesn't appear to *rely* on the extended alignment in compiler-generated code anywhere - and these pointers are not discoverable by user code, so alignment cannot make any difference there either. So it should still be safe.)

GCC *does* emit a gnu attribute for this (Since the arguments are passed via address and the vectors are only 8 bytes, there should be no visible ABI, or?):

typedef __attribute__((vector_size(8))) int v2i32;
void bar(v2i32 VArr[4], v2i32 *Dst) { *Dst = VArr[3]; }

I agree, this seems unnecessary.

GCC does *not* emit it for this, but I don't understand why as the vectors are 32 bytes in size (with v4i32 the attribute is emitted as I expected):

typedef __attribute__((vector_size(32))) int v8i32;
void bar(v8i32 VArr[2], v8i32 *Dst) { *Dst = VArr[1]; }

That seems a bug, the difference in alignment is definitely visible to user code here.

Indeed, the routine doing the checks in GCC does not appear handle vector arguments any different if they are explicitly passed as pointers. This cannot be correct. I'll have a look.