This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Iterate over phi recipes to detect reductions to fix.
ClosedPublic

Authored by fhahn on Apr 8 2021, 6:30 AM.

Details

Summary

After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Diff Detail

Event Timeline

fhahn created this revision.Apr 8 2021, 6:30 AM
fhahn requested review of this revision.Apr 8 2021, 6:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 6:30 AM
Herald added a subscriber: vkmr. · View Herald Transcript
Ayal added a comment.Apr 27 2021, 6:20 AM

Reducing the coupling with the cost model by using information directly in VPlan is great. Have a few comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

number of operands must be 2, right?

this may be fragile, if some (cast?) vpinstruction is introduced between the VPReductionRecipe and the header phi it feeds? Since in-loop-reduction phi's require special code-gen handling, perhaps their recipes should be marked as such?

4722

ditto

9170–9171

Perhaps RecipeBuilder should allow checking if it has any (in-loop) reduction phi's, and iterate over them?

fhahn updated this revision to Diff 340869.Apr 27 2021, 9:05 AM

Remove unnecessary check on number of operands, add assertion to catch if we miss an in-loop reduction in VPlan.

fhahn added inline comments.Apr 27 2021, 9:09 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

number of operands must be 2, right?

It's not needed here, I removed it.

this may be fragile, if some (cast?) vpinstruction is introduced between the VPReductionRecipe and the header phi it feeds? Since in-loop-reduction phi's require special code-gen handling, perhaps their recipes should be marked as such?

Agreed, but not sure if we need to fix this issue before landing the patch?

I added an assertion in the code below to catch the case where we fail to identify an in-loop reduction in VPlan.

4722

here PhiR could also be a first order recurrence, so for now I think we need to at least make sure it's a reduction.

Ayal added inline comments.Apr 28 2021, 2:29 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

wouldn't it be simpler, clearer and more robust if adjustRecipesForInLoopReductions() would turn on an "IsInLoop" bit inside VPWidenPhiRecipe, when converting its associated VPWiden[Select]Recipes into VPReductionRecipes?

fhahn added inline comments.Apr 29 2021, 12:56 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

It would be simpler, but I'm not sure if it is worth adding this extra coupling by adding a new InLoopReduction field to the general VPWidenPHIRecipe that needs to be maintained at this point? We could also just traverse the whole cycle if needed I think as in practice it should be only a very small number of steps. WDYT?

Ayal added inline comments.May 3 2021, 12:44 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

Agreed - the overhead of checking the cycle/chain to figure out if the reduction is in-loop or not, should be tolerable. It's about having recipes clearly model the code they represent up-front, during VPlan construction and/or planning, than whether to cache the results of this "isInLoop" query eagerly versus (re)running it lazily on demand during VPlan execution. A phi recipe representing an in-loop reduction is arguably conceptually distinct from a "regular" VPWidenPHIRecipe, with respect to the scalar-versus-vector data each operates on, and the epilogue each feeds. If the "cached result" needs to be updated, so may the cost of the recipe and of dependent recipes. Sounds reasonable?

4722

Agreed; "ditto" referred to asking here too directly if PhiR->isInLoopReduction().

9177

ditto ;-)

9180

This is to avoid redundant creation of the header block in-mask when the loop contains no non-in-loop reductions?
Worth doing in a separate patch?

9188

#ifndef NDEBUG?

perhaps an overkill - would CM.getInLoopReductionChains() eventually be discarded?

fhahn updated this revision to Diff 342975.May 5 2021, 3:00 AM

Thanks for the latest comments!

This is just a rebase, will address comments separately!

fhahn updated this revision to Diff 343169.May 5 2021, 1:37 PM

Add isInLoopReduction helper & use it instead isa<VPReductionRecipe>

fhahn marked 2 inline comments as done.May 5 2021, 1:41 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

I went ahead and added a VPWidenPHIRecipe::isInLoopReduction helper.

I'm not 100% sure this is in line with your suggestions, please let me know if I misunderstood and we should add a flag as well to start with :)

9188

Yes I think CM.getInLoopReductionChains is going away at some point. But until then it might be good to try to ensure they do not diverge. But I'd be more than happy to drop it as well, it might just be a bit too paranoid

fhahn updated this revision to Diff 343183.May 5 2021, 1:55 PM
fhahn marked an inline comment as done.

Clang-format.

Ayal added inline comments.May 10 2021, 3:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4307

This API looks good to me, thanks!
Adding a flag to implement it, as argued above, would also be good.
As stated elsewhere, there are actually 3 types of reductions, including two types of in-loop reductions: ordered and unordered.

9177

nit: suffice to check if (!PhiR || PhiR->isInLoopReduction()) - the latter implies PhiR has a recurrence descriptor.

9180

There's a subtle change here: BlockInMask may currently be created redundantly here if there are reductions but no in-loop reductions; this modification eliminates this redundancy.
Probably insignificant, given foldTailByMasking?

9188

I'd vote for cleaner, paranoia free code in this case.

One could verify the integrity of in-loop reduction chains (or VPlan in general?), say before code-gen; i.e., that in-loop header phi recipes are consistent with reduction recipes.

9257

Wouldn't it be better, if in addition to converting widen recipes of in-loop-reduction chains into reduction recipes here, we also convert their header phi upfront:

auto *PhiR = cast<VPWidenPHIRecipe>(RecipeBuilder.getRecipe(Phi));
PhiR->setIsInLoopReduction();

?

(Making sure to record the recipe of Phi)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1076

(being 'const' it can't even cache the result for repeated queries...)

fhahn updated this revision to Diff 363301.Jul 31 2021, 8:49 AM

rebased after the other phi handling refactoring patches landed. The patch now just updates the tail folding handling for reduction values to iterate over the phis in the VPlan and use VPReductionPHIRecipe directly. This reduces some late callbacks to Legal and the cost model, as well as a call to getOrAddVPValue(), which could become out-of-date after VPlan transformations.

ping :)

fhahn retitled this revision from [VPlan] Use incoming VPValue to detect in-loop reductions (NFC). to [VPlan] Iterate over phi recipes to detect reductions to fix..Jul 31 2021, 8:53 AM
fhahn edited the summary of this revision. (Show Details)
Ayal added a comment.Aug 1 2021, 1:20 PM

This does look pretty simpler now :-). Adding some more thoughts, could be addressed separately.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9166–9167

Perhaps wrap the following insert-select-for-non-in-loop-reduction-under-fold-tail along with the above, under adjustRecipesForReductions()? buildVPlanWithVPRecipes() has grown out of propotion.

9187–9190

Checking if (!Cond) seems redundant; if we want to avoid creating BlockInMask for the header block here unless there is at-least one reduction (although it will probably be created anyhow, especially under foldTail), may as well call createBlockInMask() repeatedly - it provides internal caching. I.e., should be renamed more accurately getOrCreateBlockInMask().

9190

VPValue *Red = PhiR->getBackedgeValue ();?

Would be good to note that (and remember how ;-) ILV::fixReduction() later hooks up to this select, as it seems pretty useless here.

fhahn updated this revision to Diff 363567.Aug 2 2021, 2:04 PM

Address Ayal's comments, thanks!

fhahn marked 3 inline comments as done.Aug 2 2021, 2:07 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9187–9190

Checking if (!Cond) seems redundant; if we want to avoid creating BlockInMask for the header block here unless there is at-least one reduction (although it will probably be created anyhow, especially under foldTail), may as well call createBlockInMask() repeatedly - it provides internal caching. I.e., should be renamed more accurately getOrCreateBlockInMask().

Right, I just updated the code to unconditionally create the condition, relying on the caching.

Perhaps wrap the following insert-select-for-non-in-loop-reduction-under-fold-tail along with the above, under adjustRecipesForReductions()? buildVPlanWithVPRecipes() has grown out of propotion.

Done!

9190

I updated the code as suggested, but is the loop exit value always the backedge value? I added a note that the select here is only to model the def-use chain in VPlan (AFAIU)

Ayal added inline comments.Aug 3 2021, 8:05 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9190

is the loop exit value always the backedge value?

It appears so, taken from IVDescriptors.cpp/RecurrenceDescriptor::AddReductionVar():

// The instruction used by an outside user must be the last instruction
// before we feed back to the reduction phi. Otherwise, we loose VF-1
// operations on the value.
if (!is_contained(Phi->operands(), Cur))
  return false;
  
ExitInstruction = Cur;
continue;

Otherwise the generated the select between exit-instruction and phi may be erroneous.
It would be clearer if getLoopExitInstr() would return Phi's operand across the backedge, instead of recording ExitInstruction.

9197

Thanks! Rename adjustRecipesForInLoopReductions() to adjustRecipesForReductions()?

Would be good to iterate over all header (reduction) phi's once, checking for inLoop or not. Note that the select handles only non-inLoop reductions, under foldTail. Deserves a separate patch?

9314

Ah, ILV::fixReduction() actually looks for this select and hooks up to it, under foldTail:

Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
...
VPValue *LoopExitInstDef = State.Plan->getVPValue(LoopExitInst);
...
if (Cost->foldTailByMasking() && !PhiR->isInLoop()) {
  for (unsigned Part = 0; Part < UF; ++Part) {
    Value *VecLoopExitInst = State.get(LoopExitInstDef, Part);
    Value *Sel = nullptr;
    for (User *U : VecLoopExitInst->users()) {
      if (isa<SelectInst>(U)) {
        assert(!Sel && "Reduction exit feeding two selects");
        Sel = U;
      } else
        assert(isa<PHINode>(U) && "Reduction exit must feed Phi's or select");
    }
    assert(Sel && "Reduction exit feeds no select");
    State.reset(LoopExitInstDef, Sel, Part);
  
    // If the target can create a predicated operator for the reduction at no
    // extra cost in the loop (for example a predicated vadd), it can be
    // cheaper for the select to remain in the loop than be sunk out of it,
    // and so use the select value for the phi instead of the old
    // LoopExitValue.
    if (PreferPredicatedReductionSelect ||
        TTI->preferPredicatedReductionSelect(
            RdxDesc.getOpcode(), PhiTy,
            TargetTransformInfo::ReductionFlags())) {
      auto *VecRdxPhi =
          cast<PHINode>(State.get(PhiR->getVPSingleValue(), Part));
      VecRdxPhi->setIncomingValueForBlock(
          LI->getLoopFor(LoopVectorBody)->getLoopLatch(), Sel);
    }
  }
}

Simply take PhiR->getBackedgeValue() instead there as well?

PreferPredicatedReductionSelect should also hack the phi recipe and be taken care of by VPlan execute? (Here or in a separate patch.)

fhahn updated this revision to Diff 364535.Aug 5 2021, 10:21 AM
fhahn marked 3 inline comments as done.

Address comments: fix name, use getBackedgeValue at an additional place.

fhahn marked 2 inline comments as done.Aug 5 2021, 10:26 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9190

Ok great!

9197

Thanks! Rename adjustRecipesForInLoopReductions() to adjustRecipesForReductions()?

Ah thanks, I thought I renamed it, but was wrong. should be updated now.

Would be good to iterate over all header (reduction) phi's once, checking for inLoop or not. Note that the select handles only non-inLoop reductions, under foldTail. Deserves a separate patch?

Can update as separate patch to keep the diff cleaner.

9314

Simply take PhiR->getBackedgeValue() instead there as well?

Should be updated there, thanks!

PreferPredicatedReductionSelect should also hack the phi recipe and be taken care of by VPlan execute? (Here or in a separate patch.)

I can do that in a separate one.

Ayal accepted this revision.Aug 5 2021, 1:09 PM

Thanks! This looks good to me, with a minor comment comment.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9313

Update comment? As noted below, fixReduction does rely on the select generated by this recipe, rather than introduce another one.

nit: extra space at the beginning of each line?

This revision is now accepted and ready to land.Aug 5 2021, 1:09 PM
This revision was automatically updated to reflect the committed changes.
fhahn marked 2 inline comments as done.