- Only vectors <=16 bytes expose the vector ABI when passed between functions.
- Only global vector variables >=16 bytes expose the vector ABI through their alignments.
Test suite builds with gcc/clang now give the same gnu attributes.
Differential D141409
[SystemZ] Fix handling of vectors and their exposure of the vector ABI. jonpa on Jan 10 2023, 10:33 AM. Authored by
Details
Test suite builds with gcc/clang now give the same gnu attributes.
Diff Detail
Event Timeline
Comment Actions
I guess I am a little confused on this detail as I have been looking at what GCC is doing when passing a vector argument: typedef __attribute__((vector_size(32))) int v8i32; void bar(v8i32 Arg); void foo() { v8i32 Var = {0, 0, 0, 0}; bar(Var); } The argument vector does not fit in a register, so it's passed in memory. Following the general vector alignment rules, with the SW ABI it should be aligned to 32 bytes, and with the HW ABI aligned to 8 bytes, on the stack. However, what I see with gcc is that in both cases there is only the 8 byte alignment: lay %r15,-192(%r15) mvghi 160(%r15),0 mvghi 168(%r15),0 mvghi 176(%r15),0 mvghi 184(%r15),0 la %r2,160(%r15) brasl %r14,bar@PLT Clang, on the other hand does a runtime alignment of 32 bytes for *both* ABIs: aghi %r15, -160 aghik %r1, %r15, -56 la %r2, 184(%r1) nill %r2, 65504 lgr %r15, %r1 vgbm %v0, 0 vst %v0, 16(%r2), 4 vst %v0, 0(%r2), 4 brasl %r14, bar@PLT Not sure what I am missing, but it seems to me that GCC is missing the SW ABI alignment of 32 (bug), and clang is over-aligning for the HW ABI (performance bug). I guess I would like to think that the alignment of a vector per the ABI is the same whether it's an argument on the stack or a global variable, or? To summarize: vectors >= 16 always expose the ABI and vectors <= 8 do not except when passed in registers between functions, right? Comment Actions I was actually thinking of the case struct arg { char x; v8i32 y; }; foo (struct arg arg); Here there definitely is a difference in actual generated code both on GCC and LLVM between the two ABIs, since the offset of arg.y in arg is different. Now a different question is the alignment of a simple v8i32 passed via implicit reference. Here, we should be passing a pointer to v8i32, so arguably GCC does indeed have a bug if that pointer doesn't have the natural alignment for its type. This probably doesn't matter however, as that implicit pointer is hidden and cannot be inspected by user code, so it would be impossible to actually detect any misalignment ... Comment Actions
ok - that's more clear to me now. Patch updated. Comment Actions Patch updated to also handle
Comment Actions Review addressed - hopefully this time with a better result. I think the patch now does what I want it to, while I found some test cases while checking against GCC which I am a little unsure of: (using GCC 12.2.1, 20220901) GCC does *not* emit a gnu attribute for this (A wide vector should have different alignments): typedef __attribute__((vector_size(32))) int v8i32; void GlobFun(v8i32 (*f)(v8i32)); static v8i32 foo(v8i32 Arg) { return Arg; } void fun() { GlobFun(foo); } GCC *does* emit a gnu attribute for this (Since the arguments are passed via address and the vectors are only 8 bytes, there should be no visible ABI, or?): typedef __attribute__((vector_size(8))) int v2i32; void bar(v2i32 VArr[4], v2i32 *Dst) { *Dst = VArr[3]; } GCC does *not* emit it for this, but I don't understand why as the vectors are 32 bytes in size (with v4i32 the attribute is emitted as I expected): typedef __attribute__((vector_size(32))) int v8i32; void bar(v8i32 VArr[2], v8i32 *Dst) { *Dst = VArr[1]; } GCC does *not* emit the attribute for this: v8i32 bar(v8i32 Arg) { return Arg; } GCC does *not* emit the attribute for a wide vector vararg argument: void bar(int N, ...); void foo() { v8i32 Var = {1, 2, 3, 4, 5, 6, 7, 8}; bar(1, Var); } Both compilers *emit* the attribute for this: struct SingleElStruct { v8i32 B; }; struct SingleElStruct S; However it seems that while gcc emits the alignments 8/32 for z13/z10, clang emits the big alignment also for z13, which seems wrong to me. Comment Actions @Andreas-Krebbel any comments on this?
This is somewhat unclear - these are passed via "hidden" pointers that cannot be discovered by user code, and I'm not sure the alignment requirements for "normal" pointer apply to those. The current ABI doc says:
While not explicitly stated, it would indeed be reasonable to read into that an alignment requirement, i.e. that the pointer has the same requirement as a "normal" pointer to the object. But in the new ABI, that requirement is explicitly stated as 8 bytes. The old behavior is not explicitly specified by any ABI doc, so we can really only go after what GCC actually used to do. And in fact GCC used to (and still does!) simply place such arguments on the stack and passes a hidden pointer to the stack slot - without any dynamic stack realignment, so it in fact only guarantees 8 byte alignment. So I think it is *safe* to assume that using 8 bytes always is OK - and in that sense it is safe to omit the GNU attribute. On the other hand, LLVM does actually align these to 32 bytes, so for LLVM the vector support feature does actually constitute a change in the de-facto ABI. (Still, LLVM-generated code doesn't appear to *rely* on the extended alignment in compiler-generated code anywhere - and these pointers are not discoverable by user code, so alignment cannot make any difference there either. So it should still be safe.)
I agree, this seems unnecessary.
That seems a bug, the difference in alignment is definitely visible to user code here.
That's the same as above (hidden pointers), right?
This is again the same situation (hidden pointer).
Huh, I cannot reproduce this - clang with -march=z13 does use 8 byte alignment in my test. Comment Actions I see this with latest clang: /bin/clang -O3 -o - -S -c ./test.c -march=z10 .text .file "test.c" .type S,@object # @S .section .bss,"aw",@nobits .globl S .p2align 5, 0x0 S: .space 32 .size S, 32 .ident "clang version 17.0.0 (https://github.com/llvm/llvm-project.git 77f4c91c5e052f43245555fffbb0649191e3e32f)" .section ".note.GNU-stack","",@progbits .addrsig .gnu_attribute 8, 1 ./bin/clang -O3 -o - -S -c ./test.c -march=z13 .text .file "test.c" .type S,@object # @S .section .bss,"aw",@nobits .globl S .p2align 5, 0x0 S: .space 32 .size S, 32 .ident "clang version 17.0.0 (https://github.com/llvm/llvm-project.git 77f4c91c5e052f43245555fffbb0649191e3e32f)" .section ".note.GNU-stack","",@progbits .addrsig .gnu_attribute 8, 2 Comment Actions Well, with this test case: typedef __attribute__((vector_size(32))) int v8i32; int main (void) { printf ("align: %ld\n", __alignof__(v8i32)); } I see the correct result (8 or 32 depending on march). Not sure why it chooses the palign, but that's really just an implementation choice (you may always overalign) that's not visible to others. Comment Actions
OK - I have been assuming that *all* vectors in memory should be aligned the same, but if that's not necessarily true for wide vector arguments that are forced to the stack, I guess that explains this. Should I update the patch to do just like GCC in these cases (and not emit the attribute)? Comment Actions
Patch updated to do as GCC. This worked as expected with vector args, but however with single element structs GCC emits the attribute also with wide vectors like in this test case: struct SingleElStruct { v8i32 B; }; void bar(struct SingleElStruct Arg); void foo() { struct SingleElStruct Var = {{0}}; bar(Var); } There is no difference with GCC between z10/z13 here, so this may not be actually necessary as far as I can see. (Patch now matches GCC behavior also in the single element struct case.) Comment Actions In this case I think it would be preferable to omit the attribute - it is completely safe to do so, even if it doesn't match GCC's behavior; in fact, we'd rather change GCC to match as well. Comment Actions
Ok - patch updated. Comment Actions Thanks! Minor commit nit inline, otherwise this now LGTM.
Comment Actions Indeed, the routine doing the checks in GCC does not appear handle vector arguments any different if they are explicitly passed as pointers. This cannot be correct. I'll have a look. |
I don't really like the default value here - this is such a critical decision that each caller should make it explicitly.