This is an archive of the discontinued LLVM Phabricator instance.

[Transforms][SROA] Promote allocas with mem2reg for scalable types
ClosedPublic

Authored by c-rhodes on Mar 24 2020, 11:42 AM.

Details

Summary

Aggregate types containing scalable vectors aren't supported and as far
as I can tell this pass is mostly concerned with optimisations on
aggregate types, so the majority of this pass isn't very useful for
scalable vectors.

This patch modifies SROA such that mem2reg is run on allocas with
scalable types that are promotable, but nothing else such as slicing is
done.

The use of TypeSize in this pass has also been updated to be explicitly
fixed size. When invoking the following methods in DataLayout:

  • getTypeSizeInBits
  • getTypeStoreSize
  • getTypeStoreSizeInBits
  • getTypeAllocSize

we now called getFixedSize on the resultant TypeSize. This is quite an
extensive change with around 50 calls to these functions, and also the
first change of this kind (being explicit about fixed vs scalable
size) as far as I'm aware, so feedback welcome.

A test is included containing IR with scalable vectors that this pass is
able to optimise.

Diff Detail

Event Timeline

c-rhodes created this revision.Mar 24 2020, 11:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 11:42 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

We want SROA to at least run mem2reg on scalable vectors, since we don't run mem2reg separately. This is important for C code using SVE intrinsics.

I agree we don't need to care about slicing etc.; we probably can't slice an alloca of unknown size in most cases.

We want SROA to at least run mem2reg on scalable vectors, since we don't run mem2reg separately. This is important for C code using SVE intrinsics.

mem2reg isn't run as a separate pass? We have a clang ACLE test for brka intrinsic downstream that was hitting a lot of "Request for a fixed size on a scalable object" asserts in SROA, with this patch applied it gets optimised by mem2reg:

./build/bin/clang -cc1 -internal-isystem ./build/lib/clang/9.0.1/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o -  tools/clang/test/CodeGen/AArch64/acle/acle_sve_brka.c -mllvm -print-before-pass=mem2reg -mllvm -print-after-pass=mem2reg
*** IR Dump Before Promote Memory to Register ***
; Function Attrs: nounwind
define <n x 16 x i1> @test_svbrka_b_z(<n x 16 x i1> %pg, <n x 16 x i1> %op) local_unnamed_addr #0 {
entry:
  %pg.addr = alloca <n x 16 x i1>, align 1
  %op.addr = alloca <n x 16 x i1>, align 1
  store <n x 16 x i1> %pg, <n x 16 x i1>* %pg.addr, align 1, !tbaa !2
  store <n x 16 x i1> %op, <n x 16 x i1>* %op.addr, align 1, !tbaa !2
  %0 = load <n x 16 x i1>, <n x 16 x i1>* %pg.addr, align 1, !tbaa !2
  %1 = call <n x 16 x i1> @llvm.aarch64.sve.brka.z.nxv16i1(<n x 16 x i1> %0, <n x 16 x i1> %op)
  ret <n x 16 x i1> %1
}
*** IR Dump After Promote Memory to Register ***
; Function Attrs: nounwind
define <n x 16 x i1> @test_svbrka_b_z(<n x 16 x i1> %pg, <n x 16 x i1> %op) local_unnamed_addr #0 {
entry:
  %0 = call <n x 16 x i1> @llvm.aarch64.sve.brka.z.nxv16i1(<n x 16 x i1> %pg, <n x 16 x i1> %op)
  ret <n x 16 x i1> %0
}

I ran our downstream unit tests with SROA disabled and there's no asm differences.

We want SROA to at least run mem2reg on scalable vectors, since we don't run mem2reg separately. This is important for C code using SVE intrinsics.

I've tested this on Sander's Clang patch (D76238) adding contiguous loads/stores upstream and mem2reg runs:

$ /home/culrho01/llvm-project/build/bin/clang -cc1 -internal-isystem /home/culrho01/llvm-project/build/lib/clang/11.0.0/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -emit-llvm -o - /home/culrho01/llvm-project/clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_ld1.c -D__ARM_FEATURE_SVE -mllvm -print-before-all -mllvm -print-after-all

*** IR Dump Before Promote Memory to Register ***
; Function Attrs: nounwind
define <vscale x 16 x i8> @test_svld1_s8(<vscale x 16 x i1> %pg, i8* %base) local_unnamed_addr #0 {
entry:
  %pg.addr = alloca <vscale x 16 x i1>, align 2
  store <vscale x 16 x i1> %pg, <vscale x 16 x i1>* %pg.addr, align 2, !tbaa !2
  %0 = bitcast i8* %base to <vscale x 16 x i8>*
  %1 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>* %0, i32 1, <vscale x 16 x i1> %pg, <vscale x 16 x i8> zeroinitializer)
  ret <vscale x 16 x i8> %1
}
*** IR Dump After Promote Memory to Register ***
; Function Attrs: nounwind
define <vscale x 16 x i8> @test_svld1_s8(<vscale x 16 x i1> %pg, i8* %base) local_unnamed_addr #0 {
entry:
  %0 = bitcast i8* %base to <vscale x 16 x i8>*
  %1 = call <vscale x 16 x i8> @llvm.masked.load.nxv16i8.p0nxv16i8(<vscale x 16 x i8>* %0, i32 1, <vscale x 16 x i1> %pg, <vscale x 16 x i8> zeroinitializer)
  ret <vscale x 16 x i8> %1
}``

Hmm, looking more closely, I guess there's technically one run of mem2reg in the standard pass pipeline. But really, I'm surprised that never got killed off; we should not be relying on it.

c-rhodes updated this revision to Diff 253080.Mar 27 2020, 4:25 AM
c-rhodes retitled this revision from [Transforms][SROA] Disable pass for scalable vectors to [Transforms][SROA] Promote allocas with mem2reg for scalable types.
c-rhodes edited the summary of this revision. (Show Details)
  • Run mem2reg for scalable types from SROA.
  • Fix warning warning: inline function 'llvm::Type::getVectorIsScalable' is not defined reported by Harbor by moving implementation to Type.cpp.
sdesmalen added inline comments.Mar 27 2020, 5:22 AM
llvm/lib/IR/Type.cpp
161

Can this be inlined in llvm/include/llvm/IR/Type.h ?

llvm/lib/Transforms/Scalar/SROA.cpp
4475

If you base your patch on D76748, you can use DL.getTypeAllocSize(AI.getAllocatedType()).isZero().

c-rhodes added inline comments.Mar 27 2020, 7:31 AM
llvm/lib/IR/Type.cpp
161

Hm I'm not sure, is there a reason other methods like isVectorTy aren't defined with inline in Type.h?

llvm/lib/Transforms/Scalar/SROA.cpp
4475

Ah nice, thanks for pointing that out I'll update this.

ctetreau added inline comments.Mar 27 2020, 9:33 AM
llvm/include/llvm/IR/Type.h
233

Personally, I think functions like this are a code smell. Why have a type hierarchy at all if we're just going to do everything through the base type?

However, given the fact that the VectorType hierarchy is going to become more complicated soon, I'd really prefer that this function not be added. I'll propose alternatives inline below, but if you want this function, I'd prefer that:

  1. be a static function in SROA.cpp.
  2. be implemented in terms of isa<VectorType> instead of isVectorTy.
llvm/lib/Transforms/Scalar/SROA.cpp
4474

This can be rewritten:

{
  auto *AT = AI.getAllocatedType();
  if (AI.isArrayAllocation() || !AT->isSized() ||
      (isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable()) ||
      DL.getTypeAllocSize(AT).getFixedSize() == 0)
    return false;
}

AI.getAllocatedType is used 3 times, might as well give it a name. An while isa<VectorType>(AT) && cast<VectorType>(AT)->isScalable() is a little longer than AT.isScalableVectorTy, it's not that bad. on the positive side, it's more explicit as to what it's doing, and it's also less misleading because there's no such thing as a ScalableVectorTy.

4596
if (isa<VectorType>(AI->getAllocatedType()) &&
    cast<VectorType>(AI->getAllocatedType())->isScalable() &&
    isAllocaPromotable(AI))

Same as above.

c-rhodes updated this revision to Diff 253168.Mar 27 2020, 10:51 AM
  • Removed isScalableVectorTy and replaced uses in SROA with isa<VectorType> and cast<VectorType>(VecTy)->isScalable().
efriedma added inline comments.Mar 27 2020, 10:54 AM
llvm/include/llvm/IR/Type.h
233

I think for code that specifically cares about the property "would getTypeAllocSize() return a scalable TypeSize", it's not unreasonable to have a function on Type to query that. The key here being that the code doesn't really care whether the type is a vector.

It might make sense to leave it out for now, though, and try to refactor later when we have a better idea what code ends up doing in practice. I think most code that would want to do that calls getTypeAllocSize() somewhere nearby anyway.

c-rhodes marked 3 inline comments as done.Mar 27 2020, 10:54 AM

Thanks for the comments @ctetreau! I've updated the patch.

llvm/include/llvm/IR/Type.h
233

Thanks for the suggestion, I've removed this. I don't think it's particularly useful as a function in SROA given it isn't (and is unlikely) to be used a great deal.

efriedma added inline comments.Mar 27 2020, 10:58 AM
llvm/lib/Transforms/Scalar/SROA.cpp
4598

This looks weird; did you mean to write something like this?

if (AllocaInst *AI = dyn_cast<AllocaInst>(I)) {
  if (isa<VectorType>(AI->getAllocatedType()) &&
      cast<VectorType>(AI->getAllocatedType())->isScalable()) {
    if (isAllocaPromotable(AI))
      PromotableAllocas.push_back(AI);
  } else {
    Worklist.insert(AI);
  }
}
c-rhodes added inline comments.Mar 27 2020, 1:17 PM
llvm/lib/Transforms/Scalar/SROA.cpp
4598

Oops, yes I did! Good spot, we don't want allocas with scalable types that aren't promotable added to the worklist as runOnAlloca will blow up. Cheers, I'll fix this.

c-rhodes updated this revision to Diff 253206.Mar 27 2020, 1:30 PM

Fix bug in branch where allocas with scalable types that aren’t promotable could still be added to worklist and subsequently passed to runOnAlloca where asserts would be triggered as this is now fixed-width. Thanks for spotting @efriedma.

c-rhodes marked an inline comment as done.Mar 27 2020, 1:31 PM
ctetreau added inline comments.Mar 27 2020, 1:52 PM
llvm/include/llvm/IR/Type.h
233

Thanks for the suggestion, I've removed this. I don't think it's particularly useful as a function in SROA given it isn't (and is unlikely) to be used a great deal.

Appreciate it.

I think for code that specifically cares about the property "would getTypeAllocSize() return a scalable TypeSize", it's not unreasonable to have a function on Type to query that. The key here being that the code doesn't really care whether the type is a vector.

I'd feel better about this function, but I'd like to see it become a more common issue before we add it; Type is cluttered enough as it is.

I'd like to see better test coverage here. At least, a case that can't be promoted, and a case where a scalable alloca lands on the worklist despite the check in SROA::runImpl.

c-rhodes updated this revision to Diff 253416.Mar 29 2020, 6:22 AM
  • Add test for unpromotable scalable alloca.
  • Rename %pg -> %vec in <vscale x 16 x i8> test.

I'd like to see better test coverage here. At least, a case that can't be promoted, and a case where a scalable alloca lands on the worklist despite the check in SROA::runImpl.

I've added a test for a scalable alloca that can't be promoted. I'm not really sure how test the latter, unless you're thinking about the bug you spotted, in which case the test I've added should cover that?

We shouldn't hit it but we do skip scalable allocas in runOnAlloca.

This revision is now accepted and ready to land.Mar 30 2020, 12:41 PM
This revision was automatically updated to reflect the committed changes.