This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][AArch64] Support arm_sve_vector_bits attribute
ClosedPublic

Authored by c-rhodes on Aug 11 2020, 9:01 AM.

Details

Summary

This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.

VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:

  • Implicit casting between VLA <-> VLS types.
  • Coercion of VLS types in function args/return.
  • Mangling of VLS types.

Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.

Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.

The VLS and VLA types are defined by the ACLE to map to the same
machine-level SVE vectors. This patch implements mangling support for
this. The mangling scheme is defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.

This is based on the prototype in D85128.

[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling

Diff Detail

Event Timeline

c-rhodes created this revision.Aug 11 2020, 9:01 AM
c-rhodes requested review of this revision.Aug 11 2020, 9:01 AM
efriedma added inline comments.Aug 12 2020, 11:29 AM
clang/lib/AST/ItaniumMangle.cpp
3330

Mangling them the same way is going to cause practical issues; they're different types from a C++ perspective, so they need distinct manglings. For example, you'll crash the compiler if you refer to both foo<svint64_t> and foo<fixed_int64_t>.

clang/lib/CodeGen/CGCall.cpp
1238

getFixedSize()?

1254

getFixedSize():? (etc.; please go through the whole patch.)

c-rhodes updated this revision to Diff 285576.Aug 14 2020, 1:26 AM

Changes:

  • s/getKnownMinSize/getFixedSize/g fixes.
  • Implemented new mangling scheme for VLS types.
c-rhodes marked 2 inline comments as done.Aug 14 2020, 1:48 AM
c-rhodes added inline comments.
clang/lib/AST/ItaniumMangle.cpp
3330

Mangling them the same way is going to cause practical issues; they're different types from a C++ perspective, so they need distinct manglings. For example, you'll crash the compiler if you refer to both foo<svint64_t> and foo<fixed_int64_t>.

The ACLE is yet to define the mangling scheme for fixed-length SVE types so I kept the mangling the same, which is also what GCC currently does. After speaking with @rsandifo-arm yesterday we agreed to come up with a mangling scheme where the types are mangled in the same way as:

__SVE_VLS<typename, unsigned>

where the first argument is the underlying variable-length type and the second argument is the SVE vector length in bits. For example:

#if __ARM_FEATURE_SVE_BITS==512                    
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE                                
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));              
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE                                  
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));              
#endif

let us know if you have any feedback/concerns about this approach.

clang/lib/CodeGen/CGCall.cpp
1360

@efriedma If we're happy with the element bitcast above this can also be fixed but I wasn't if that was ok, although it's pretty much what was implemented in the original codegen patch.

david-arm added inline comments.Aug 14 2020, 6:28 AM
clang/lib/CodeGen/CGCall.cpp
1342

I think if you restructure the code here you could do:

if (isa<llvm::ScalableVectorType>(SrcTy) || isa<llvm::ScalableVectorType>(DstTy) ||

SrcSize.getFixedSize() <= DstSize.getFixedSize())

since you know that the scalable types have been eliminated by the time we do the "<=" comparison.

1360

Given the if statement above has eliminated scalable vector types I think it's safe to use DstSize.getFixedSize() here.

c-rhodes updated this revision to Diff 285642.Aug 14 2020, 6:54 AM

More getFixedSize fixes.

c-rhodes marked 2 inline comments as done.Aug 14 2020, 6:55 AM
efriedma accepted this revision.Aug 14 2020, 2:54 PM

LGTM

Like I mentioned on the review for the prototype, I still think we should try to implement a scheme that makes CK_BItCast between fixed and scalable types trivial. Doing coercion this way is going to have a significant performance cost. But there isn't any user-visible effect, so I'm fine with leaving that for a followup.

clang/lib/AST/ItaniumMangle.cpp
3330

Makes sense.

This revision is now accepted and ready to land.Aug 14 2020, 2:54 PM
This revision was automatically updated to reflect the committed changes.

LGTM

Like I mentioned on the review for the prototype, I still think we should try to implement a scheme that makes CK_BItCast between fixed and scalable types trivial. Doing coercion this way is going to have a significant performance cost. But there isn't any user-visible effect, so I'm fine with leaving that for a followup.

I agree the bitcast scheme certainly isn't optimal but it's a start at least and something we intend to address going forward. Thanks for reviewing!

Hi! The attr-arm-sve-vector-bits-call.c test seems to be failing on our clang builders:

FAIL: Clang :: CodeGen/attr-arm-sve-vector-bits-call.c (3020 of 25924)
******************** TEST 'Clang :: CodeGen/attr-arm-sve-vector-bits-call.c' FAILED ********************
Script:
--
: 'RUN: at line 3';   /b/s/w/ir/k/staging/llvm_build/bin/clang -cc1 -internal-isystem /b/s/w/ir/k/staging/llvm_build/lib/clang/12.0.0/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -msve-vector-bits=512 -fallow-half-arguments-and-returns -S -O1 -emit-llvm -o - /b/s/w/ir/k/llvm-project/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c | /b/s/w/ir/k/staging/llvm_build/bin/FileCheck /b/s/w/ir/k/llvm-project/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c
--
Exit Code: 1

Command Output (stderr):
--
warning: Compiler has made implicit assumption that TypeSize is not scalable. This may or may not lead to broken code.
warning: Compiler has made implicit assumption that TypeSize is not scalable. This may or may not lead to broken code.
/b/s/w/ir/k/llvm-project/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c:67:16: error: CHECK-NEXT: is not on the line after the previous match
// CHECK-NEXT: [[X_ADDR:%.*]] = alloca <vscale x 4 x i32>, align 16
               ^
<stdin>:52:2: note: 'next' match was here
 %retval.coerce.i = alloca <vscale x 4 x i32>, align 16
 ^
<stdin>:50:7: note: previous match ended here
entry:
      ^
<stdin>:51:1: note: non-matching line after previous match is here
 %x.i = alloca <16 x i32>, align 16
^

Input file: <stdin>
Check file: /b/s/w/ir/k/llvm-project/clang/test/CodeGen/attr-arm-sve-vector-bits-call.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
         .
         .
         .
        47: 
        48: ; Function Attrs: nounwind readnone
        49: define <vscale x 4 x i32> @sizeless_caller(<vscale x 4 x i32> %x) local_unnamed_addr #1 {
        50: entry:
        51:  %x.i = alloca <16 x i32>, align 16
        52:  %retval.coerce.i = alloca <vscale x 4 x i32>, align 16
next:67      !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: match on wrong line
        53:  %x.addr = alloca <vscale x 4 x i32>, align 16
        54:  %coerce.coerce = alloca <vscale x 4 x i32>, align 16
        55:  %coerce1 = alloca <16 x i32>, align 16
        56:  %saved-call-rvalue = alloca <16 x i32>, align 64
        57:  store <vscale x 4 x i32> %x, <vscale x 4 x i32>* %x.addr, align 16, !tbaa !5
         .
         .
         .
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  Clang :: CodeGen/attr-arm-sve-vector-bits-call.c

Could you take a look? Thanks.

Builder: https://luci-milo.appspot.com/p/fuchsia/builders/ci/clang-linux-x64/b8870800848452818112?

Hi! The attr-arm-sve-vector-bits-call.c test seems to be failing on our clang builders:

Could you take a look? Thanks.

Builder: https://luci-milo.appspot.com/p/fuchsia/builders/ci/clang-linux-x64/b8870800848452818112?

Sorry about that, I've reverted it (commit 2e7041f) whilst I investigate. Thanks for raising.

Hi! The attr-arm-sve-vector-bits-call.c test seems to be failing on our clang builders:

Could you take a look? Thanks.

Builder: https://luci-milo.appspot.com/p/fuchsia/builders/ci/clang-linux-x64/b8870800848452818112?

Sorry about that, I've reverted it (commit 2e7041f) whilst I investigate. Thanks for raising.

The IR differences were caused by the new pass manager which is on by default for the Fuchsia builder. I've re-landed the patch with a fix for CodeGen/attr-arm-sve-vector-bits-call.c to use the legacy pm with -fno-experimental-new-pass-manager.

The IR differences were caused by the new pass manager which is on by default for the Fuchsia builder. I've re-landed the patch with a fix for CodeGen/attr-arm-sve-vector-bits-call.c to use the legacy pm with -fno-experimental-new-pass-manager.

Thanks for the update! We do have the new PM on by default, but I'm surprised that this wouldn't appear on clang-x86_64-debian-new-pass-manager-fast which also tests the new PM.

The IR differences were caused by the new pass manager which is on by default for the Fuchsia builder. I've re-landed the patch with a fix for CodeGen/attr-arm-sve-vector-bits-call.c to use the legacy pm with -fno-experimental-new-pass-manager.

Thanks for the update! We do have the new PM on by default, but I'm surprised that this wouldn't appear on clang-x86_64-debian-new-pass-manager-fast which also tests the new PM.

No problem, I checked and it did fail for that builder [1] but for some reason I didn't receive an email.

[1] http://lab.llvm.org:8011/builders/clang-x86_64-debian-new-pass-manager-fast/builds/14050