Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.
ClosedPublic

Authored by fhahn on Nov 4 2021, 3:55 PM.

Details

Summary

At the moment, the primary induction variable for the vector loop is
created as part of the skeleton creation. This is tied to creating the
vector loop latch outside of VPlan. This prevents from modeling the
*whole* vector loop in VPlan, which in turn is required to model
preheader and exit blocks in VPlan as well.

This patch introduces a new recipe VPCanonicalIVRecipe to represent the
primary IV in VPlan and InductionIncrement{NUW} opcodes for
VPInstruction to model the increment.

This allows us to partly retire createInductionVariable. At the moment,
a bit of patching up is done after executing all blocks in the plan.

This is still WIP as there are a couple of test failures left to fix,
but I'd like to share it so the general direction can be reviewed.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn updated this revision to Diff 396405.Dec 28 2021, 9:13 AM

Update & rebase after landing D116320 & D116304, move CanonicalIVStartValue handling to prepareToExecute.

Ayal added inline comments.Dec 29 2021, 12:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3003–3004

If we are sure no latch exists yet as we're just creating this loop, update comment and assert(!L->getLoopLatch())?

3608–3609

Perhaps place the call to createLatchTerminator(Lp) above the setting of OldInduction, as the latter is not used by the former, whereas createInductionResumeValues() below does use OldInduction. Perhaps OldInduction can be eliminated altogether, following [New]Induction?

4508

Plan is used only here, right? So can fold into
auto *IVR = PhiR->getParent()->getPlan()->getCanonicalIV();

8931

Ahh, on second thought, this method adds two recipes (one being a VPInstruction, but that may change..), one placed first in the header and the other last in the latch. How about "addCanonicalIVRecipes()"?

9685

State.get per-part 0?

llvm/lib/Transforms/Vectorize/VPlan.cpp
726

If Step is fixed to be VF*UF rather than support arbitrary inductions, a more accurate name would be CanonicalIVIncrement[NUW]?

Potentially related to https://reviews.llvm.org/D116123#inline-1111668

945

If the ICmpEQ is to be generated here/now connected to InductionIncrement and branch, suggest to do so separately rather than inside a loop taking care of rewiring header phis. The InductionIncrement can be located directly at the end of the Exit, analogous to getCanonicalIV(), conceptually as part of a terminating inc-cmp-br idiom.

1356

State.get per-part 0?

llvm/lib/Transforms/Vectorize/VPlan.h
2183

Hold VPValue VectorTripCount; as an instance rather than a pointer?

2255

Can return a reference, is never null.

2361

can drop phis().

fhahn updated this revision to Diff 396615.Dec 29 2021, 11:05 PM
fhahn marked 12 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3003–3004

I removed the variable, added the assert and also renamed the function

3608–3609

Removed.

4508

updated, thanks

8931

Renamed and extended the comment to explicitly mention the increment VPInstruction.

9685

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
726

Updated to the more descriptive CanonicalIVIncrement

945

Moved outside the loop.

llvm/lib/Transforms/Vectorize/VPlan.h
2183

That makes things a bit simpler, updated!

2255

Adjusted!

2361

Removed!

Ayal added inline comments.Dec 30 2021, 1:10 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
578–581

More accurately, something like:
/// Introduce a conditional branch (on true, condition to be set later) at the end of the header=latch connecting it to itself (across the backedge) and to the exit block of \p L.

8931

addCanonicalIVPHIRecipes >> addCanonicalIVRecipes

... to the header block.

The phi is added to the header block, the increment is added to the latch block.

llvm/lib/Transforms/Vectorize/VPlan.cpp
781

How about printing something like " VF*UF + " instead?

928

if (isa_and_nonnull<VPWidenPHIRecipe>(PhiR)) ?

Use PhiR instead of &R below?

949

call getCanonicalIV(), or call getCanonicalIVIncrement() which should take care of looking up the backedge value of the canonical IV - or retrieving the last recipe in the latch directly?

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178

Verify that the last recipe in Exit is a CanonicalIVIncrement VPInstruction?

fhahn updated this revision to Diff 396772.Dec 31 2021, 4:02 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
578–581

Adjusted wording, thanks!

8931

Adjusted wording ,thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
781

That is much better, updated!

928

There are some phi-like recipes that do not inherit from VPHeaderPHIRecipe, like VPWidenIntOrFpInductionRecipe. I think it would be better to adjust this separately, once they *only* model the phi.

I updated the code the explicitly skip unwanted recipes and then use cast<VPHeaderPHIRecipe>.

Use PhiR instead of &R below?

Done!

949

updated to use getCanonicalIV.

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178

I am not sure this is something we should enforce in the verifier, as it is easy to retrieve from the canonical IV.

Ayal added inline comments.Jan 1 2022, 1:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8932

addCanonicalIVPHIRecipes >> addCanonicalIVRecipes (one is a PHI, the other is not)

llvm/lib/Transforms/Vectorize/VPlan.cpp
717

Ah, now it looks more logical to first handle Part 0; and possibly fuse the last State.set & break.

928
if (isa<VPWidenIntOrFpInductionRecipe>(&R) || isa<VPWidenPHIRecipe>(&R) ||
   isa<VPWidenCanonicalIVRecipe>(&R))
  continue;

Reasoning behind this early-exit:
createVectorIntOrFpInductionPHI()/widenPHIInstruction() take care of generating LastInduction/InductionGEP and rewiring them back to the phi during VPWidenIntOrFpInductionRecipe/VPWidenPHIRecipe::execute, respectively. Once these values are recipe'd, they too c/should be rewired here instead (TODO).
VPWidenCanonicalIVRecipe depends on the phi of VPCanonicalIVRecipe (which is rewired below) rather than having its own phi. It represents an original IR phi that turns into a non-IR instruction when vectorized, similar to VPBlendRecipe, which should appear right after "real" phi recipes. Should they be placed outside VPBB->phis()? Should both be supported, e.g., VPBB->real-phis() and VPBB->aliased-phis()?

932

"// For first-order recurrences [, canonical IV] and in-order reduction phis ..."

"Otherwise" - for non ordered reductions,

959

It should be possible to obtain any vector, stored per-part; any uniform scalar, stored per-part; and any non-uniform scalar, stored per-lane.

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178

That's fine. Conceptually, the phi of the canonical IV can be placed anywhere among the header phis, although it's good to canonicalize its position to be first. OTOH, the inc-cmp-br of the canonical IV must be placed at the end of the latch, so it would be more robust to rely on it and its user(s) instead.

fhahn updated this revision to Diff 396852.Jan 1 2022, 4:18 AM
fhahn marked 3 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.Jan 1 2022, 4:20 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8932

updated, thanks

llvm/lib/Transforms/Vectorize/VPlan.cpp
928

Reasoning behind this early-exit:
createVectorIntOrFpInductionPHI()/widenPHIInstruction() take care of generating LastInduction/InductionGEP and rewiring them back to the phi during VPWidenIntOrFpInductionRecipe/VPWidenPHIRecipe::execute, respectively. Once these values are recipe'd, they too c/should be rewired here instead (TODO).

Exactly, I added. a comment.

VPWidenCanonicalIVRecipe depends on the phi of VPCanonicalIVRecipe (which is rewired below) rather than having its own phi. It represents an original IR phi that turns into a non-IR instruction when vectorized, similar to VPBlendRecipe, which should appear right after "real" phi recipes. Should they be placed outside VPBB->phis()? Should both be supported, e.g., VPBB->real-phis() and VPBB->aliased-phis()?

VPWidenCanonicalIVRecipe should probably be placed outside the phis section, as it doesn't expand to phi instructions. I put up D116473. I don't think there's a strong need for more involved accessors like real-phis and aliased-phis.

932

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
178

OTOH, the inc-cmp-br of the canonical IV must be placed at the end of the latch, so it would be more robust to rely on it and its user(s) instead.

Sounds good, we can adjust it once this is modeled with recipes.

fhahn marked an inline comment as done.Jan 1 2022, 12:43 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.cpp
945

I put up D116479 to add a cmp-and-branch VPInstruction opcode as a follow-up.

fhahn updated this revision to Diff 396924.Jan 2 2022, 4:53 AM

Rebase after landing b1a333f0feb8.

Ayal added inline comments.Jan 2 2022, 9:34 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2528

Slightly more consistent to check ScalarIV->getType() != IV->getType(), given the cast from one to the other below.

May be good to assign auto NeededType = IV->getType() ?

8937

"CanonicalIV" >> "CanonicalIVPhi", to complement "CanonicalIVIncrement".

9687

getVPValue(0) >> getVPSingleValue()?

llvm/lib/Transforms/Vectorize/VPlan.cpp
973

VectorTripCount is always used by the compare of branch-on-count, and by it only. So its VPValue is always generated but will have a user only once the compare (possibly w/ its increment and branch) is also represented as a recipe/VPInstruction. For now can assert it has no users; code introduced otherwise is dead.

fhahn updated this revision to Diff 396937.Jan 2 2022, 10:46 AM

Address latest comments, thanks!

fhahn marked 4 inline comments as done.Jan 2 2022, 10:49 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2528

I added the new variable, thanks!

8937

Adjusted!

9687

There's no need to use a getVP*Value accessor, this can passed directly. Updated!

llvm/lib/Transforms/Vectorize/VPlan.cpp
973

Good point, I replaced the printing with assert, will move the printing to the patch that uses it explicitly.

Ayal accepted this revision.Jan 4 2022, 2:23 PM

This looks fine, thanks!!
Clarifying a previous suggestion.

llvm/lib/Transforms/Vectorize/VPlan.cpp
717

Ah, now it looks more logical to first handle Part 0; and possibly fuse the last State.set & break.

Above thought was something like:

case VPInstruction::CanonicalIVIncrement:
case VPInstruction::CanonicalIVIncrementNUW: {
  Value *Next = nullptr;
  if (Part == 0) {
    bool IsNUW = getOpcode() == VPInstruction::CanonicalIVIncrementNUW;
    auto *Phi = State.get(getOperand(0), 0);
    // The loop step is equal to the vectorization factor (num of SIMD elements)
    // times the unroll factor (num of SIMD instructions).
    Value *Step = createStepForVF(Builder, Phi->getType(), State.VF, State.UF);
    Next = Builder.CreateAdd(Phi, Step, "index.next", IsNUW, false);
  } else
    Next = State.get(this, 0);

  State.set(this, Next, Part);
  break;
}
This revision is now accepted and ready to land.Jan 4 2022, 2:23 PM
fhahn updated this revision to Diff 397488.Jan 5 2022, 2:36 AM
fhahn marked 4 inline comments as done.

Thanks for all the comments! Rebased and applied suggestion. I will land this soon.

This revision was landed with ongoing or failed builds.Jan 5 2022, 2:46 AM
This revision was automatically updated to reflect the committed changes.