This is an archive of the discontinued LLVM Phabricator instance.

[LV] Generate predicate in a proper VPBasicBlock
Needs ReviewPublic

Authored by nikolaypanchenko on Jul 24 2023, 7:52 AM.

Details

Reviewers
fhahn
ABataev
Summary

Current implementation generates masks on demand by calling to
createBlockInMask function. The function is calling VPBuilder which
uses previously set insertion point. As a result, invalid non-flattened
VPlan is constructed as EdgeMask can be built in destination
VPBasicBlock.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 7:52 AM
nikolaypanchenko requested review of this revision.Jul 24 2023, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 7:52 AM
nikolaypanchenko added a comment.EditedJul 24 2023, 7:58 AM

The problem was found in our downstream compiler with masked-ordered reduction enabled, where non-flattened invalid VPlan looks like before adjustRecipesForReductions:

  WIDEN ir<%add.i> = fadd ir<%add.i2426>, ir<%conv.i>
Successor(s): if.else.i.i.i.i.i15

if.else.i.i.i.i.i15:
  EMIT vp<%12> = or vp<%5> ir<%1>
  EMIT vp<%13> = not ir<%0>
  EMIT vp<%14> = select vp<%12> vp<%13> ir<false>
  WIDEN ir<%3> = load ir<%this>, vp<%14>

and after

  REDUCE ir<%add.i> = ir<%add.i2426> + reduce.fadd (ir<%conv.i>, vp<%12>)
Successor(s): if.else.i.i.i.i.i15

if.else.i.i.i.i.i15:
  EMIT vp<%12> = or vp<%5> ir<%1>
  EMIT vp<%13> = not ir<%0>
  EMIT vp<%14> = select vp<%12> vp<%13> ir<false>
  WIDEN ir<%3> = load ir<%this>, vp<%14>

Hoisted out InsertPointGuard

fhahn added a comment.Aug 4 2023, 3:37 AM

Interesting issue! Would it be possible to construct a test case showing the issue on current main? Would it be possible to only change the insertion point if need to avoid the issue to avoid many of the test changes?

Interesting issue! Would it be possible to construct a test case showing the issue on current main?

Unfortunately, I was not able to come up with a test for main. I even cannot come up with some nice changeset (clean and robust) to allow this to happen. There're 2 main blockers: masked ordered reduction is not supported by IVDescriptor; tail-folding does not support masked reductions.

Would it be possible to only change the insertion point if need to avoid the issue to avoid many of the test changes?

Similarly, I could not find something simpler. One possible option is to "predict" such transformation by adjustRecipesForReductions during masks construction, but that is going to add boilerplate into createBlockInMask and createEdgeMask functions.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll