This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors
ClosedPublic

Authored by david-arm on Sep 9 2020, 6:51 AM.

Details

Summary

In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:

  1. The optimisation does not work for cases where the cast type

is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.

  1. If we pass all the various checks for the case when the allocated

type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.

  1. For the remaining case when both the alloca and cast types are

scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.

For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.

I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:

Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll

Diff Detail

Event Timeline

david-arm created this revision.Sep 9 2020, 6:51 AM
Herald added a reviewer: efriedma. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
david-arm requested review of this revision.Sep 9 2020, 6:51 AM
sdesmalen added inline comments.Sep 9 2020, 9:37 AM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
103

This also seems to prevent the optimisation on bitcasts between scalable-vector-type T1 and an alloca with scalable-vector-type T2. You can make the expression use != instead of ||, but then the code below needs some more changes to work on TypeSize.

david-arm added inline comments.Sep 9 2020, 10:00 AM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
103

Yes, this was deliberate. I think I discussed this in the commit message. I was trying to avoid making needless code changes that's all because casting alloca to scalable still causes us to fail the alignment checks below. I simply couldn't write a test case that proved the additional work needed was correct. I thought it better to avoid changing the code path and then being unable to defend it. I still have the original patch that converted everything to using TypeSize, which I could make use of here if we are able to come up with a test that showed a before and after difference.

sdesmalen added inline comments.Sep 14 2020, 3:09 AM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
103

Okay, thanks for explaining! I guess an extra argument is that because scalable vectors are not allowed in arrays, there is little benefit of making the code below work for the scalable/scalable case even if the ABI alignment would match.

If it is not too much effort, it would be nice if this function could handle:

%tmp = alloca <vscale x 16 x i8>, align 16
%cast = bitcast <vscale x 16 x i8>* %tmp to <vscale x 2 x i64>*
store volatile <vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64>* %cast, align 16
%reload = load volatile <vscale x 2 x i64>, <vscale x 2 x i64>* %cast, align 16
store <vscale x 2 x i64> %reload, <vscale x 2 x i64>* %out, align 16ret void

->

%tmp = alloca <vscale x 2 x i64>, align 16
store volatile <vscale x 2 x i64> zeroinitializer, <vscale x 2 x i64>* %tmp, align 16
%reload = load volatile <vscale x 2 x i64>, <vscale x 2 x i64>* %tmp, align 16
store <vscale x 2 x i64> %reload, <vscale x 2 x i64>* %out, align 16
ret void

Even if that is only handled as a special case before bailing out. This case is currently supported albeit with the necessary warnings being emitted.

llvm/test/Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll
31

Not something to be fixed in this patch, but InstCombine changing the alignment to 64 seems wrong for scalable vectors.

47

nit: This test does not actually test your change, because it bails out on the first alignment check. The same holds for the two functions below (scalable16i32_to_fixed16i32 and scalable32i32_to_scalable16i32).

I think it's fine to leave the tests in, because at least it guards against possible regressions in case the ABI alignment or checks ever change.

david-arm updated this revision to Diff 292139.Sep 16 2020, 1:42 AM
david-arm edited the summary of this revision. (Show Details)
  • Added support for 'promotion' of alloca type to the cast type when both types are scalable, have the same size and alignment.
david-arm marked an inline comment as done.Sep 16 2020, 1:42 AM
sdesmalen accepted this revision.Sep 22 2020, 6:07 AM

LGTM, thanks for the changes @david-arm!

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
142

If arrays of scalable types are not supported, this should be an assert.

This revision is now accepted and ready to land.Sep 22 2020, 6:07 AM