This is an archive of the discontinued LLVM Phabricator instance.

[i386] Modify the alignment of __m128/__m256/__m512 vector type according i386 abi.
ClosedPublic

Authored by LiuChen3 on Apr 21 2020, 7:16 AM.

Details

Summary

According to i386 System V ABI:

  1. when __m256 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 32 byte boundary at the time of the call.
  2. when __m512 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 64 byte boundary at the time of the call.

The current method of clang passing __m512 parameter is as follow:

  1. when target supports avx512, passing it with 64 byte alignment;
  2. when target supports avx, passing it with 32 byte alignment;
  3. Otherwise, passing it with 16 byte alignment.

Passing __m256 parameter is as follow:

  1. when target supports avx or avx512, passing it with 32 byte alignment;
  2. Otherwise, passing it with 16 byte alignment.

This pach will passing m128/m256/__m512 following i386 System V ABI and
apply it to Linux only since other System V OS (e.g Darwin, PS4 and FreeBSD) don't
want to spend any effort dealing with the ramifications of ABI breaks at present.

Diff Detail

Event Timeline

LiuChen3 created this revision.Apr 21 2020, 7:16 AM

I'm not sure this is right. _m512 is just a typedef to a 64 byte vector_size attribute. This patch changes the behavior of using 64 byte vector_size on an sse/avx target. Prior to avx512 existing an avx target would pass a 512 bit vector 32 byte aligned. It does't make sense to me to change the alignment on an avx target just because avx512 exists, but isn't enabled.

I might be wrong, but I wonder if we should be using the alignment of the type not the size of the type? All the _m128/_m256/_m512 types have an alignment attribute on them.

I'm not sure this is right. _m512 is just a typedef to a 64 byte vector_size attribute. This patch changes the behavior of using 64 byte vector_size on an sse/avx target. Prior to avx512 existing an avx target would pass a 512 bit vector 32 byte aligned. It does't make sense to me to change the alignment on an avx target just because avx512 exists, but isn't enabled.

If we use an library function which passing __m512 on stack(such as variadic function) compiled with avx512 but the caller compiled without avx512, this will cause run fail. But actually, when parameters passed by registers, it will cause run fail too because caller and callee use different register. Therefore, it is hard to say whether it is reasonable to align to 64 bytes.

but I wonder if we should be using the alignment of the type not the size of the type? All the _m128/_m256/_m512 types have an alignment attribute on them.

That's reasonable. Then we can omit getTypeSize().

I'm not sure this is right. _m512 is just a typedef to a 64 byte vector_size attribute. This patch changes the behavior of using 64 byte vector_size on an sse/avx target. Prior to avx512 existing an avx target would pass a 512 bit vector 32 byte aligned. It does't make sense to me to change the alignment on an avx target just because avx512 exists, but isn't enabled.

If we use an library function which passing __m512 on stack(such as variadic function) compiled with avx512 but the caller compiled without avx512, this will cause run fail. But actually, when parameters passed by registers, it will cause run fail too because caller and callee use different register. Therefore, it is hard to say whether it is reasonable to align to 64 bytes.

See:

https://bugs.llvm.org/show_bug.cgi?id=39501
https://reviews.llvm.org/D53919

GCC warns in this case.

I'm not sure this is right. _m512 is just a typedef to a 64 byte vector_size attribute. This patch changes the behavior of using 64 byte vector_size on an sse/avx target. Prior to avx512 existing an avx target would pass a 512 bit vector 32 byte aligned. It does't make sense to me to change the alignment on an avx target just because avx512 exists, but isn't enabled.

If we use an library function which passing __m512 on stack(such as variadic function) compiled with avx512 but the caller compiled without avx512, this will cause run fail. But actually, when parameters passed by registers, it will cause run fail too because caller and callee use different register. Therefore, it is hard to say whether it is reasonable to align to 64 bytes.

See:

https://bugs.llvm.org/show_bug.cgi?id=39501
https://reviews.llvm.org/D53919

GCC warns in this case.

I think that is another topic. I will add warning for clang.

LiuChen3 updated this revision to Diff 259200.Apr 22 2020, 1:27 AM

Determine whether the type is m128/m256/__m512 by type alignment rather than type size.
Since I am not familiar with front-end, adding diagnostics will take some effort. I think it would be better to foucus on this calling-convention for now.

RKSimon resigned from this revision.Jun 18 2020, 1:50 AM

Can you provide a few C testcases comparing gcc and clang, where clang currently misaligns an argument? I briefly tried a few testcases with __m256i vectors, and clang seemed to do the right thing.

clang/lib/CodeGen/TargetInfo.cpp
1548

Ty->isVectorType()

1830

We don't want to use getIndirect() here if we can avoid it; byval makes it harder for the compiler to reason about the value.

Can you provide a few C testcases comparing gcc and clang, where clang currently misaligns an argument? I briefly tried a few testcases with __m256i vectors, and clang seemed to do the right thing.

@efriedma, thanks for your review.

Here is a little case:

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <immintrin.h>

typedef union {
        int d[4];
        __m128 m;
} M128;

void test(int argCount, ...) {
        M128 res;
        int retValue = 0;
        va_list args;
        va_start(args, argCount);
        res.d[0] = res.d[1] = res.d[2] = res.d[3] = 0;
        res.m = va_arg(args, __m128);
        printf("%d %d %d %d\n", res.d[0], res.d[1], res.d[2], res.d[3]);
        va_end(args);
}

int main(void) {
        int retValue = 0;
        M128 a;
        a.d[0] = 0; a.d[1] = 2; a.d[2] = 4; a.d[3] = 6;
        test(1, a.m);
        return 0;
}

The option i use is '-m32 -O0'.
The output of clang is: 134520832 -7561716 134514139 0
And the output of gcc is right.

If I'm following correctly, the change to getTypeStackAlignInBytes() makes the lowering of va_arg correct for m256/m512 etc. on targets where they're legal types. (And you could independently verify this by pass a va_list from a gcc-compiled function to a clang-compiled function.)

The change to X86_32ABIInfo::classifyArgumentType, then, is specifically targeted at cases where the vector type in question isn't legal. It fixes the alignment, I guess, so it's always consistent with what va_arg thinks the alignment should be? Can we restrict this change to varargs function calls?

LiuChen3 updated this revision to Diff 272653.Jun 23 2020, 3:01 AM

Restricting this change to varargs function calls and address comments.

LiuChen3 marked an inline comment as done.Jun 23 2020, 3:10 AM

Hi, @efriedma.

I used gcc and clang cross-compilation for testing, and currently found no problems.
main.c:

#include "abi.h"

int main(void) {
        int retValue = 0;
        M128 a;
        M256 b;
        M512 c;
        a.d[0] = 0; a.d[1] = 2; a.d[2] = 4; a.d[3] = 6;
        b.d[0] = 0; b.d[1] = 2; b.d[2] = 4; b.d[3] = 6;
        c.d[0] = 0; c.d[1] = 2; c.d[2] = 4; c.d[3] = 6;
        c.d[4] = 8; c.d[5] = 10; c.d[6] = 12; c.d[7] = 14;
        testv128(1, a.m);
        testv256(1, b.m);
        testv512(1, c.m);
        return 0;
}

test.c:

#include "abi.h"

void testv128(int argCount, ...) {
        M128 res;
        int retValue = 0;
        va_list args;
        va_start(args, argCount);
        res.d[0] = res.d[1] = res.d[2] = res.d[3] = 0;
        res.m = va_arg(args, __m128);
        printf("%d %d %d %d\n", res.d[0], res.d[1], res.d[2], res.d[3]);
        va_end(args);
}

void testv256(int argCount, ...) {
        M256 res;
        int retValue = 0;
        va_list args;
        va_start(args, argCount);
        res.d[0] = res.d[1] = res.d[2] = res.d[3] = 0;
        res.m = va_arg(args, __m256);
        printf("%lld %lld %lld %lld\n", res.d[0], res.d[1], res.d[2], res.d[3]);
        va_end(args);
}

void testv512(int argCount, ...) {
        M512 res;
        int retValue = 0;
        va_list args;
        va_start(args, argCount);
        res.d[0] = res.d[1] = res.d[2] = res.d[3] = 0;
        res.d[4] = res.d[5] = res.d[6] = res.d[7] = 0;
        res.m = va_arg(args, __m512);
        for(int i = 0; i < 8; ++i)
          printf("%lld ", res.d[i]);
        printf("\n");
        va_end(args);
}

abi.h:

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <immintrin.h>

typedef union {
        int d[4];
        __m128 m;
} M128;

typedef union {
        long long d[4];
        __m256 m;
} M256;

typedef union {
        long long d[8];
        __m512 m;
} M512;


void testv128(int argCount, ...);
void testv256(int argCount, ...);
void testv512(int argCount, ...);

The option i use:

clang -m32 -c main.c
gcc -m32 -c test.c
clang -m32 main.o test.o

clang -m32 -mavx2 -c main.c
gcc -m32 -mavx2 -c test.c
clang -m32 -mavx2 main.o test.o

clang -m32 -c -mavx512f main.c
gcc -m32 -c -mavx512f test.c
clang -m32 -mavx512f main.o test.o

Then use clang to compile test.c and gcc to compile main.c.

Before this patch, The output is incorrect.

efriedma added inline comments.Jun 23 2020, 1:09 PM
clang/lib/CodeGen/TargetInfo.cpp
1084

Please don't use default arguments here; it's isn't helping readability.

echristo added inline comments.Jun 23 2020, 1:41 PM
clang/lib/CodeGen/TargetInfo.cpp
1084

Avoid boolean arguments if possible too.

LiuChen3 marked an inline comment as done.Jun 23 2020, 7:18 PM
LiuChen3 added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
1084

Thanks for you review. I think it's not easy to judge if current arguments is variable argument without this boolean parameter.

LiuChen3 updated this revision to Diff 272898.Jun 23 2020, 7:29 PM

Remove default arguments.

efriedma added inline comments.Jun 23 2020, 7:36 PM
clang/lib/CodeGen/TargetInfo.cpp
1084

As a general style rule, if a function has two alternative modes, prefer naming the two alternatives with an enum, instead of using a boolean. This makes the call more readable. (See https://google.github.io/styleguide/cppguide.html#Function_Argument_Comments etc.)

We should probably describe this somewhere in the LLVM coding standards document.

LiuChen3 marked an inline comment as done.Jun 23 2020, 9:19 PM
LiuChen3 added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
1084

Thanks for your information. I misunderstood echristo's mean.

I followed X86_64ABIInfo which use 'isNamedArg' to indicate whether the argument is named. Should we change the style?

LiuChen3 updated this revision to Diff 290169.Sep 6 2020, 8:37 PM

Address comments: Replacing the bool argument with an enum argument

No. I think this patch can only fix part of the issue.

No. I think this patch can only fix part of the issue.

Can you fix the go issue?

No. I think this patch can only fix part of the issue.

Can you fix the go issue?

If we can confirm how gcc does it, I think I can fix that.

For now we just assume the rules of gcc does are as follows:

StackAlignmentForType(T):

1. If T's alignment is < 16 bytes, return 4.
2. If T is a struct/union/array type, then:
    recursively call StackAlignmentForType() on each member's type (note -- this ignores any attribute((aligned(N))) directly on the fields of a struct, but not those that appear on typedefs, or the underlying types).
    If all of those calls return alignments < 16, then return 4.
3. Otherwise, return the alignment of T.

We need to confirm that this is the actually behavior of gcc.

LiuChen3 updated this revision to Diff 334874.Apr 1 2021, 5:51 PM

Rebase and avoid using 'byval' parameter.

pengfei added inline comments.Apr 2 2021, 12:33 AM
clang/lib/CodeGen/TargetInfo.cpp
1904

Can we always use the type alignment despite named and unnamed?

LiuChen3 updated this revision to Diff 335380.Apr 5 2021, 7:38 PM

Address Pengfei's comments

Patch LGTM. If this is what GCC is doing on Linux, then we should match it.

pengfei accepted this revision.Apr 7 2021, 1:24 AM

LGTM.

This revision is now accepted and ready to land.Apr 7 2021, 1:24 AM
This revision was landed with ongoing or failed builds.Apr 14 2021, 1:47 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 1:47 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Thanks for your review. Hope this patch won't cause too many ABI issues in the future.