This is an archive of the discontinued LLVM Phabricator instance.

[PATCH 3/4][Sema][AArch64] Add codegen for arm_sve_vector_bits attribute
AbandonedPublic

Authored by c-rhodes on Jul 10 2020, 6:09 AM.

Details

Summary

This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define fixed-length (VLST) versions
of existing sizeless types (VLAT).

Implemented in this patch is the lowering of VLSTs to valid types.
VLSTs (unlike VLATs) can be used in globals, members of structs
and unions, and arrays. To support this in this patch we lower VLSTs to
arrays. For example, in the following C code:

#if __ARM_FEATURE_SVE_BITS==512
typedef svint32_t fixed_svint32_t __attribute__((arm_sve_vector_bits(512)));
struct struct_int32 {
  fixed_int32_t x;
} struct_int32;
#endif

the struct is lowered to:

%struct.struct_int32 = type { [16 x i32] }

where the member 'x' is a fixed-length variant of 'svint32_t' that
contains exactly 512 bits.

When loading from a VLST to a VLAT, or when storing a VLAT to a VLST,
the address is bitcasted, e.g.

bitcast [N x i8]* %addr.ptr to <vscale x 16 x i8>*

Patch contains changes by Cullen Rhodes and Sander de Smalen.

[1] https://developer.arm.com/documentation/100987/latest

Diff Detail

Event Timeline

c-rhodes created this revision.Jul 10 2020, 6:09 AM
c-rhodes updated this revision to Diff 277043.Jul 10 2020, 8:05 AM

Changes:

  • Use fixed-length instead of fixed-width in naming.

What's the tradeoff of representing these in IR as vscale'ed vector types, as opposed to fixed-wdith vector types?

What's the tradeoff of representing these in IR as vscale'ed vector types, as opposed to fixed-wdith vector types?

If you mean alloca's for single vectors, then that's partly to do with better test coverage of the stackframe layout with scalable vectors until we can start testing that with auto-vectorized code. Also, currently LLVM only implements the VL-scaled addressing modes for the scalable IR type and would otherwise always use base addressing mode if the type is fixed-width (basereg = sp/fp + byteoffset; ld1 dstreg, [basereg, #0 mul VL]), so until we add those smarts, code quality will probably be better.

clang/lib/CodeGen/CGRecordLayoutBuilder.cpp
135–137

Can you add comments for the false and true parameters, e.g. /*ForBitField*/ false, /*EnforceFixedLengthSVEAttribute*/ true

clang/lib/CodeGen/CodeGenModule.cpp
3731–3732

same here.

clang/lib/CodeGen/CodeGenTypes.cpp
81

nit: s/getFixedSVETypeForMemory/getFixedLengthSVETypeForMemory/

94

Can you add a comment explaining why SveBool gets an i8 element type for it's memory type?

clang/lib/CodeGen/CodeGenTypes.h
137–140

Can you add a comment here to explain what EnforceFixedLengthSVEAttribute does?

If you mean alloca's for single vectors

I was really referring to the IR values themselves, not the memory representation. Since the width of the vectors is known, you could emit IR without any mention of scalable types at all (assuming the backend was extended to handle the intrinsics).

The choice of vscale'ed types for variables is also interesting, though. Thanks for the explanation.

If you mean alloca's for single vectors

I was really referring to the IR values themselves, not the memory representation. Since the width of the vectors is known, you could emit IR without any mention of scalable types at all (assuming the backend was extended to handle the intrinsics).

That's right, the reason is because codegen of the intrinsics currently only works on scalable types. By casting the pointer to a vscale-pointer, all IR values are always scalable so we don't need to worry about doing things like reinterpet_cast from a scalable to fixed-width vector, or vice versa.

If you mean alloca's for single vectors

I was really referring to the IR values themselves, not the memory representation. Since the width of the vectors is known, you could emit IR without any mention of scalable types at all (assuming the backend was extended to handle the intrinsics).

That's right, the reason is because codegen of the intrinsics currently only works on scalable types. By casting the pointer to a vscale-pointer, all IR values are always scalable so we don't need to worry about doing things like reinterpet_cast from a scalable to fixed-width vector, or vice versa.

I guess that's reasonable. I suspect we're eventually going to end up with that functionality anyway, but maybe not right now.

clang/lib/CodeGen/CodeGenTypes.h
138

The default for EnforceFixedLengthSVEAttribute seems backwards; I would expect that almost everywhere that calls ConvertTypeForMem actually wants the fixed-length type. The scalable type only exists in registers.

c-rhodes updated this revision to Diff 278466.Jul 16 2020, 7:15 AM

Changes:

  • Rebased.
  • Added comments for args in calls to ConvertTypeForMem when EnforceFixedLengthSVEAttribute is set and documented EnforceFixedLengthSVEAttribute.
  • s/getFixedSVETypeForMemory/getFixedLengthSVETypeForMemory/
  • Documented memory representation for fixed-length predicates.
c-rhodes marked 5 inline comments as done.Jul 16 2020, 7:21 AM
c-rhodes added inline comments.
clang/lib/CodeGen/CodeGenTypes.h
138

The default for EnforceFixedLengthSVEAttribute seems backwards; I would expect that almost everywhere that calls ConvertTypeForMem actually wants the fixed-length type. The scalable type only exists in registers.

It has no effect unless T->isVLST() so I think it makes sense.

efriedma added inline comments.Jul 16 2020, 1:47 PM
clang/lib/CodeGen/CodeGenTypes.h
138

My question is "why is the current default for EnforceFixedLengthSVEAttribute correct?" You answer for that is "because VLST types are rare"? I'm not sure how that's related.

Essentially, the issue is that ConvertTypeForMem means "I'm allocating something in memory; what is its type?". Except for a few places where we've specifically added handling to make it work, the code assumes scalable types don't exist. So in most places, we want the fixed version. With the current default, I'm afraid we're going to end up with weird failures with various constructs you haven't tested.

I guess if there's some large number of places where the current default is actually beneficial, the current patch wouldn't make it obvious, but my intuition is that are few places like that.

c-rhodes updated this revision to Diff 279265.Jul 20 2020, 8:45 AM

Change the default for EnforceFixedLengthSVEAttribute.

c-rhodes marked an inline comment as done.Jul 20 2020, 8:56 AM
c-rhodes added inline comments.
clang/lib/CodeGen/CodeGenTypes.h
138

My question is "why is the current default for EnforceFixedLengthSVEAttribute correct?" You answer for that is "because VLST types are rare"? I'm not sure how that's related.

Essentially, the issue is that ConvertTypeForMem means "I'm allocating something in memory; what is its type?". Except for a few places where we've specifically added handling to make it work, the code assumes scalable types don't exist. So in most places, we want the fixed version. With the current default, I'm afraid we're going to end up with weird failures with various constructs you haven't tested.

Sorry I misunderstood what you meant. I think you're right that does make sense, I guess the benefit of defaulting to false is (hopefully) those failures would have come to our attention and we could explicitly add test cases for those, although I suspect the same applies with your suggestion with the added benefit of us supporting constructs we haven't explicitly tested as you say. Anyhow, I've made the change, cheers!

efriedma added inline comments.Jul 20 2020, 12:45 PM
clang/lib/CodeGen/CGExpr.cpp
153

Do we need to bitcast the result of CreateTempAlloca to a pointer to the array type? I'm concerned that we might miss a bitcast if the source code uses the address of the variable.

clang/lib/CodeGen/CodeGenModule.cpp
3986

EmitNullConstant should just do the right thing, I think, now that we've changed the default behavior of ConvertTypeForMem.

clang/lib/CodeGen/CodeGenTypes.cpp
151

I think the default handling for constant arrays should do the right thing, now that we've changed the default behavior of ConvertTypeForMem.

c-rhodes marked an inline comment as done.Jul 23 2020, 10:12 AM
c-rhodes added inline comments.
clang/lib/CodeGen/CGExpr.cpp
153

Do we need to bitcast the result of CreateTempAlloca to a pointer to the array type? I'm concerned that we might miss a bitcast if the source code uses the address of the variable.

You were right, I've spent some time investigating this. The current implementation crashes on:

fixed_int32_t global;
fixed_int32_t address_of_global() {
  fixed_int32_t *global_ptr;
  global_ptr = &global;
  return *global_ptr;
}

the reason being global is represented as an ArrayType whereas the pointer global_ptr is scalable:

@global = global [4 x i32] zeroinitializer, align 16
%global_ptr = alloca <vscale x 4 x i32>*, align 8

so when storing the address of global to global_ptr the store it tries to create causes a crash:

store [4 x i32]* @global, <vscale x 4 x i32>** %global_ptr, align 8

I tried your suggestion to bitcast to alloca to the array type in CreateMemTemp but found for that example it isn't called, it's created by a call to CreateTempAlloca in CGDecl.cpp (EmitAutoVarAlloca). CreateTempAlloca takes an llvm::Type *Ty so it's not as straightforward as doing a bitcast there, although I found it could be done in EmitAutoVarAlloca but it means having to handle this is two places I'm aware of and potentially others I haven't hit. In this case as well it also required looking through the pointer to see if the pointee was a VLST then doing a bitcast.

I've also experimented with representing allocas as fixed-length arrays to see if that will make it any easier and it does simplify the patch a little. It does require handling PointerType in ConvertTypeForMem however as we do for ConstantArray, same issue I mentioned in response to your other comment about removing that.

I planning to update the patch with that implementation but I've just found another issue:

fixed_int32_t arr[3];
fixed_int32_t *z() {
  fixed_int32_t *array_ptr;
  array_ptr = &arr[0];
  return array_ptr;
}

trying to create a store:
store [4 x i32]* %0, <vscale x 4 x i32>** %retval, align 8

although this is done in CGStmt.cpp as it's for a retval so it looks like a bitcast could also be required there.

clang/lib/CodeGen/CodeGenModule.cpp
3986

EmitNullConstant should just do the right thing, I think, now that we've changed the default behavior of ConvertTypeForMem.

Good spot, these changes can be removed

clang/lib/CodeGen/CodeGenTypes.cpp
151

I think the default handling for constant arrays should do the right thing, now that we've changed the default behavior of ConvertTypeForMem.

ConvertType looks at the canonical type so the type attribute is lost.

efriedma added inline comments.Jul 23 2020, 3:26 PM
clang/lib/CodeGen/CGExpr.cpp
153

I think a fixed_int32_t * needs to be converted to [4 x i32]*, for the sake of consistency... but see also my other comment.

clang/lib/CodeGen/CodeGenTypes.cpp
151

That sounds like a bug in the AST: since isVLST() affects the semantics of the type, it needs to be part of the canonical type. Otherwise you're going to be finding bugs all over in both Sema and CodeGen.

c-rhodes abandoned this revision.Aug 3 2020, 5:47 AM

I've posted a prototype D85128 with an alternative implementation, given it's quite different to this patch I've posted it as a separate patch and am abandoning this one. See new patch for more details, cheers