This is an archive of the discontinued LLVM Phabricator instance.

[Prototype][SVE] Support arm_sve_vector_bits attribute
AbandonedPublic

Authored by c-rhodes on Aug 3 2020, 5:41 AM.

Details

Summary

This patch is a prototype demonstrating an alternative approach to D83553 which
turned out to be an unviable solution.

In that approach vector-length-specific types (VLSTs) defined by the attribute
were by default represented as scalable vectors except in constructs where
scalable vectors aren't supported in IR, such as globals and structs, where they
were represented as fixed-length arrays. When loading from a VLST to a VLAT, or
when storing a VLAT to a VLST, the address was bitcasted, e.g.

bitcast [N x i8]* %addr.ptr to <vscale x 16 x i8>*

The issue with that approach was VLSTs were represented as AttributedType in
the AST and were not part of the canonical type. This was problematic in places
such as CodeGenTypes that look at the canonical type as special handling was
required for types such as ConstantArrayType that needed to be lowered to
fixed-length arrays. See the patch for more information on issues with that
approach.

In this implementation VLSTs are represented as VectorType in the AST and
fixed-length vectors in the IR everywhere except in function args/return.
Predicates are represented with i8 as they were in D83553 to avoid layout
issues in structs. For example, in the following C code:

#if __ARM_FEATURE_SVE_BITS==512
typedef svbool_t fixed_bool_t __attribute__((arm_sve_vector_bits(512)));
#endif

fixed_bool_t becomes <8 x i8> in the IR.

In function args/return VLSTs are coerced from fixed to scalable vectors. This
is implemented through the AArch64 ABI in TargetInfo. As BuiltinType::SveBool
and BuiltinType::SveUint8 are both represented as a VectorType of element
type BuiltinType::UChar, to support this in the ABI two new vectors kinds were
required to distinguish between predicates and data vectors.

Casting between VLAT/VLST is handled by the CK_BitCast operation and this has
been extended in CodeGen to support the new vector kinds, where the cast is
implemented through memory rather than a bitcast which is unsupported.
Implementing this as a normal bitcast would require relaxing checks in LLVM to
allow bitcasting between scalable and fixed types. Another option was adding
target-specific intrinsics, although codegen support would need to be added for
these intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.

The semantics implemented in D83551 are changed as the AttributedType has been
replaced by VectorType in the AST. When VLSTs were represented as sizeless
types only minimal changes in Sema were necessary to permit them in places such
as structs, but no changes were required for implicit casting as the canonical
types were the same.

The AArch64 ACLE states VLSTs defined by the attribute map to the same AAPCS64
types as the sizeless variants. In the previous approach mangling wasn't
necessary as the canonical types were the same. The mangling scheme is defined
in the appendices to the Procedure Call Standard for the Arm Architecture, see

https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling

For more information on the attribute see:

https://developer.arm.com/documentation/100987/latest
NOTE: This patch is intended as a prototype to demonstrate the approach. This is quite a large patch containing a number of changes, if the approach is considered valid the plan is to break it up into separate patches.

Diff Detail

Unit TestsFailed

Event Timeline

c-rhodes created this revision.Aug 3 2020, 5:41 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
c-rhodes requested review of this revision.Aug 3 2020, 5:41 AM

Stupid questions.

  • Is it for convenience? You get arrays, global variables, structs, ... . Vectorization becomes easier ...
  • Are there any potential performance benefits over scalable vectors?
  • Is it compatible with GCC?

Not going to write detailed review comments, but this looks like the right approach in general.

One high-level thing to consider: we could still decide that in IR generation, we want to represent VLSTs registers using scalable vector types, like the original patch did. This would allow avoiding the awkward "bitcast" implementation. That interacts with a relatively narrow slice of clang CodeGen, though; we could easily change it later without impacting the rest of the changes.

Stupid questions.

  • Is it for convenience? You get arrays, global variables, structs, ... . Vectorization becomes easier ...

Yes, this allows the definition of types that can be used in constructs sizeless types cannot. In earlier revisions of the ACLE there was the concept of sizeless structs defined by a __sizeless_struct keyword that could have members of sizeless type in addition to members of sized type, although there was push back on the idea of sizeless aggregates in general and that idea was dropped. If you're interested in the background there's more information here [1][2].

[1] https://gcc.gnu.org/legacy-ml/gcc/2019-11/msg00088.html
[2] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-10/msg00868.html

  • Are there any potential performance benefits over scalable vectors?

If VLSTs are represented as fixed-length vectors in LLVM as they are in this prototype then hopefully we can take advantage of existing optimisations, although I think there is work to be done there especially around the interaction with scalable vectors and supporting those in existing passes. This patch required a few changes to existing passes to bail out for scalable vectors so we're already hitting parts of the codebase we've yet to hit that would be good candidates for optimisation work. This also ties into the fixed-length code generation work @paulwalker-arm has been doing which is still early days. I'm not sure if that answers your question, but ultimately the compiler should have more information about these types given the vector size is explicit so it should be able to do a better job at optimisation.

  • Is it compatible with GCC?

Support for this attribute landed in GCC 10 and it's more complete than what this patch implements. We've yet to implement the behaviour guarded by the __ARM_FEATURE_SVE_VECTOR_OPERATORS and __ARM_FEATURE_SVE_PREDICATE_OPERATORS feature macros, so the GNU __attribute__((vector_size)) extension is not available and operators such as binary '+' are not supported for VLSTs. Support for this is intended to be addressed by later patches.

Not going to write detailed review comments, but this looks like the right approach in general.

Thanks for taking a look! I'll split this up into separate patches soon.

One high-level thing to consider: we could still decide that in IR generation, we want to represent VLSTs registers using scalable vector types, like the original patch did. This would allow avoiding the awkward "bitcast" implementation. That interacts with a relatively narrow slice of clang CodeGen, though; we could easily change it later without impacting the rest of the changes.

Yeah now that the VLST is part of the canonical type with the new vector kinds we have more information if we were to go the CodeGenTypes route if that's what you're referring to as the narrow slice of CodeGen. That would still require converting between VLAT/VLST, I quite like this approach as it gives me more confidence we're not missing bitcasts when doing it as part of a cast operation. I guess with what you're suggesting the bitcast could still be emitted there but the cast operations could be limited in Sema to cases where ultimately ConvertType would return a type that requires bitcasting, or are you saying that could be avoided completely?

  • Is it compatible with GCC?

Support for this attribute landed in GCC 10 and it's more complete than what this patch implements. We've yet to implement the behaviour guarded by the __ARM_FEATURE_SVE_VECTOR_OPERATORS and __ARM_FEATURE_SVE_PREDICATE_OPERATORS feature macros, so the GNU __attribute__((vector_size)) extension is not available and operators such as binary '+' are not supported for VLSTs. Support for this is intended to be addressed by later patches.

Just to clarify, GCC doesn't have support for vectors of booleans or operations on them (__ARM_FEATURE_SVE_PREDICATE_OPERATORS) yet either but does support the behaviour indicated by the other macro.

Sorry. I meant ABI. Can link GCC .o files with Clang .o files using the attributes?

I guess with what you're suggesting the bitcast could still be emitted there but the cast operations could be limited in Sema to cases where ultimately ConvertType would return a type that requires bitcasting, or are you saying that could be avoided completely?

The bitcast operation would exist in Sema either way; it's necessary for the types to stay consistent. My suggestion is just that bitcasting between a VLAT and the corresponding VLST it would be a no-op in CodeGen.

My suggestion is similar to the way the "bool" type works: in memory, it's an i8, but when you load it, it's truncated it to i1.

Sorry. I meant ABI. Can link GCC .o files with Clang .o files using the attributes?

Yes they should be compatible. The machine-level ABI distinguishes 4 types of SVE vector [1]:

  • VG×64-bit vector of 8-bit elements
  • VG×64-bit vector of 16-bit elements
  • VG×64-bit vector of 32-bit elements
  • VG×64-bit vector of 64-bit elements

The VLS and VLA types are defined by the ACLE to map to the same machine-level SVE vectors (in both compilers). Hence the mangling changes in this patch.

[1] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#fundamental-data-types

c-rhodes abandoned this revision.Sep 2 2020, 4:36 AM

Closing this now the prototype has been split into separate patches that have landed.