Current implementation generates masks on demand by calling to
createBlockInMask function. The function is calling VPBuilder which
uses previously set insertion point. As a result, invalid non-flattened
VPlan is constructed as EdgeMask can be built in destination
VPBasicBlock.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The problem was found in our downstream compiler with masked-ordered reduction enabled, where non-flattened invalid VPlan looks like before adjustRecipesForReductions:
WIDEN ir<%add.i> = fadd ir<%add.i2426>, ir<%conv.i> Successor(s): if.else.i.i.i.i.i15 if.else.i.i.i.i.i15: EMIT vp<%12> = or vp<%5> ir<%1> EMIT vp<%13> = not ir<%0> EMIT vp<%14> = select vp<%12> vp<%13> ir<false> WIDEN ir<%3> = load ir<%this>, vp<%14>
and after
REDUCE ir<%add.i> = ir<%add.i2426> + reduce.fadd (ir<%conv.i>, vp<%12>) Successor(s): if.else.i.i.i.i.i15 if.else.i.i.i.i.i15: EMIT vp<%12> = or vp<%5> ir<%1> EMIT vp<%13> = not ir<%0> EMIT vp<%14> = select vp<%12> vp<%13> ir<false> WIDEN ir<%3> = load ir<%this>, vp<%14>
Interesting issue! Would it be possible to construct a test case showing the issue on current main? Would it be possible to only change the insertion point if need to avoid the issue to avoid many of the test changes?
Unfortunately, I was not able to come up with a test for main. I even cannot come up with some nice changeset (clean and robust) to allow this to happen. There're 2 main blockers: masked ordered reduction is not supported by IVDescriptor; tail-folding does not support masked reductions.
Would it be possible to only change the insertion point if need to avoid the issue to avoid many of the test changes?
Similarly, I could not find something simpler. One possible option is to "predict" such transformation by adjustRecipesForReductions during masks construction, but that is going to add boilerplate into createBlockInMask and createEdgeMask functions.