This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
ClosedPublic

Authored by david-arm on Aug 2 2021, 8:13 AM.

Details

Summary

This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

Transforms/LoopVectorize/scalable-assume.ll
Transforms/LoopVectorize/scalable-lifetime.ll
Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Diff Detail

Event Timeline

david-arm created this revision.Aug 2 2021, 8:13 AM
david-arm requested review of this revision.Aug 2 2021, 8:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 8:13 AM

All the scalable-xyz tests were just copied from the original fixed-width xyz tests and adapted for scalable vectors.

sdesmalen added inline comments.Aug 3 2021, 4:03 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8951

Rather than having a switch within a switch, would it make sense to just bail out with false if I is not a CallInst? Or one step further, specialize the function to isUniformIntrinsicCall, which takes a IntrinsicInst* instead of Instruction*?

8954

nit: The switch is missing a default case, which may result in build warnings.

8961

It would be nice if there would be a better way to determine if the operands are uniform (e.g. if the pointer has a bitcast in the loop of a value that's otherwise loop-invariant, that could also be considered uniform), other than testing it like you do here. Maybe you could additionally check if the operands are either uniform or scalar, but I'm not sure if we care (enough), since in practically all cases the operands will be loop-invariant.

nit: how about writing this expression as:

return IsScalable || OrigLoop->hasLoopInvariantOperands(I);
8970–8972

I guess the assume could be widened and the lanes could be llvm.reduce.or'ed together, but I doubt that LLVM will be able to use that knowledge in practice. I think dropping information about the other lanes is acceptable, and still an improvement of not vectorizing at all or having the compiler crash while trying to scalarize :)

8975–8976

as we don't support vectorizing allocas for scalable vectors.

I don't think this is the argument for why we assume the value to be uniform. The alloca could be outside the loop, and it could only be the pointers that are not loop-invariant.

It's more that any non-uniform pointers probably won't be recognized by other passes, since ValueTracking's findAllocaForValue only looks through simple bitcasts/inttoptr/ptrtoint/phi instructions, the intrinsic is ignored for anything more complicated when it can't determine the alloca.

Matt added a subscriber: Matt.Aug 3 2021, 6:55 AM
david-arm updated this revision to Diff 363732.Aug 3 2021, 7:21 AM
  • Addressed some review comments.
david-arm marked 3 inline comments as done.Aug 3 2021, 7:22 AM
david-arm updated this revision to Diff 363986.Aug 4 2021, 12:41 AM
  • Addressed review comment by renaming function to isUniformIntrinsicCall
fhahn added inline comments.Aug 4 2021, 12:57 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8961

can you add a comment here saying why we need to do another check?

8968

may here makes it sound like it is not clear if it is better. The patch/code decides it is better, perhaps make that clear in the wording?

8973

Can you keep the reasoning along here in line with LangRef? There's no reference to alloca there, just stack objects. Also, if it is not a stack object, the intrinsic still has an effect: poisoning the object. This should still allow to remove the call.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
176 ↗(On Diff #363986)

this only supports intrinsics, can it take IntrinsicInst?

llvm/test/Transforms/LoopVectorize/scalable-assume.ll
90

is this needed?(same for some other tests)

93

is this needed_

97

is this needed?

david-arm added inline comments.Aug 4 2021, 1:00 AM
llvm/test/Transforms/LoopVectorize/scalable-assume.ll
90

Maybe not? I didn't write these tests, but copied them from assume.ll. I can clean them up though.

david-arm updated this revision to Diff 364063.Aug 4 2021, 5:10 AM
  • Addressed review comments - tidied up tests, renamed new function, improved comments to be more inline with the LangRef.
david-arm marked 7 inline comments as done.Aug 4 2021, 5:10 AM
sdesmalen added inline comments.Aug 4 2021, 6:14 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8952–8953

nit: I think this can simply be:

switch (CI->getIntrinsicID()) {
8975

Perhaps not something to change in this patch, but given the change I've made in D107286, I wonder if it makes sense to mark any operation that has only loop-invariant operands as 'uniform' in collectLoopUniforms, e.g.

if (all_of(I.operands(), [&](Value *V) { return isOutOfScope(V); }))
  addToWorklistIfAllowed(&I);

@fhahn any thoughts on this?

As a smaller first step, you could do the above suggestion only for this limited set of intrinsics (and then override IsUniform in handleReplication for this set of intrinsics if the VF is scalable).

That said, I don't really want to hold up this patch too much given that it fixes the issue sufficiently, and it's the last piece of the puzzle to build LNT and Clang with scalable vectorization enabled. So maybe such refactoring can be done in a follow-up patch?

david-arm updated this revision to Diff 364151.Aug 4 2021, 9:10 AM
david-arm edited the summary of this revision. (Show Details)
  • Changed the patch to add some intrinsics to the Uniforms list where possible and inlined parts of isUniformIntrinsicCall into handleReplication instead.
sdesmalen accepted this revision.Aug 4 2021, 9:15 AM

Thanks @david-arm. LGTM with nits addressed!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5431–5432

nit: this can be:

switch (II->getIntrinsicID()) {

There is no need for getVectorIntrinsicIDForCall

8964–8965

nit: same comment about using cast<IntrinsicInst>(I)->getIntrinsicID()

8984

nit: maybe move this condition to the if-statement, and set IsUniform = true directly?

This revision is now accepted and ready to land.Aug 4 2021, 9:15 AM
fhahn added inline comments.Aug 4 2021, 9:54 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8963

Should this directly be guarded by isScalable? Would also be good to mention that this inly applies to scalable vectors and why it is needed up front here.

david-arm updated this revision to Diff 364381.Aug 5 2021, 1:38 AM
  • Addressed review comments.
david-arm marked 8 inline comments as done.Aug 5 2021, 1:38 AM
This revision was landed with ongoing or failed builds.Aug 5 2021, 7:17 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Aug 5 2021, 7:18 AM

LGTM, thanks. Please address the pre-merge lint issue and also make sure the title + description of the patch match the latest version of the code; at the moment it is not in sync with the code.

fhahn added a comment.Aug 5 2021, 7:25 AM

LGTM, thanks. Please address the pre-merge lint issue and also make sure the title + description of the patch match the latest version of the code; at the moment it is not in sync with the code.

To be more specific, the title is not accurate because it implies more instructions are replicated, when in fact it does the opposite; it does not require replicating certain calls because it treats them as uniform. It also is not limited to scalable vectors.

david-arm retitled this revision from [LoopVectorize] Add support for replication of more intrinsics with scalable vectors to [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform.Aug 6 2021, 2:14 AM