This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
ClosedPublic

Authored by fhahn on Apr 10 2023, 12:46 PM.

Details

Summary

This patch adds a new entry block the VPlan to place early SCEV
expansions like the trip count.

It updates createVectorizedLoopSkeleton to take a VPlan & State as
arguments, which can then be used to access the trip count. This is a
first step towards modeling skeleton construction in VPlan and a start
of gradually moving parts of createVectorizedLoopSkeleton towards VPlan.

At the moment, the new preheader is executed before
createVectorizedLoopSkeleton is called, and the rest afterwards.

I will share a follow-up patch that also expands step expressions early
to fix #58811.

Still WIP as some tests need to be updated.

Diff Detail

Event Timeline

fhahn created this revision.Apr 10 2023, 12:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 12:46 PM
fhahn requested review of this revision.Apr 10 2023, 12:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2023, 12:46 PM
Ayal added a comment.Apr 17 2023, 2:30 PM

Raises some thoughts...
These ExpandSCEV recipes, which must expand into IR that can/must be placed in the original pre-header, before making any changes to the CFG, are conceptually Live-In's for which code needs to be generated, i.e., have a recipe, albeit a rather unique one, which can reside in a VPBB, albeit a rather unique one.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
500

Worth mentioning \p Plan and State?

615–616
615–616

Can be made const?

2876–2877

Passing InsertBlock is now redundant?

3259

Worth adding old preheader to the drawing, now that SCEVs are expanded there?

7708

Better placed under "// 0. Generate SCEV-dependent code into the preheader, including TripCount, before making any changes to the CFG." ?

7709

Easier to set here ILV.TripCount = State.get(BestVPlan.getTripCount(), 0); than pass Plan and State to createVectorizedLoopSkeleton() just for that?

8894–8895

Above comment should also include the new (old) pre-header.

8904

Should Preheader be detached from VecPreheader (and any other block), given that they represent disparate blocks and also get executed separately? Retaining the latter as Plan's entryBlock, keeping the former as Plan's (currently) isolated block for this and all other ExpandSCEV recipes.
Perhaps worth verifying that all ExpandSCEV recipes reside in this unique Preheader VPBB.

9109

nit: can make Plan here after creating TripCount and Preheader as when building with recipes instead of setTripCount() and setEntry(). HCFGBuilder will then take care of filling (Old)Entry, independently.

9119

nit: empty line intentional?

llvm/lib/Transforms/Vectorize/VPlan.h
2315–2316

Is this code dead (and method can indeed drop its OrCreate and become const) - provided either a non-null argument was provided when constructing the VPlan with recipes or non-null TripCount was set when constructing w/o? Otherwise is TripCount deleted?

fhahn updated this revision to Diff 516101.Apr 22 2023, 1:24 PM
fhahn marked 11 inline comments as done.

Address comments, thanks!

I updated the patch to have a freestanding preheader block. It's not printed yet, but I'll add printing once we are happy with the overall state.

fhahn added inline comments.Apr 22 2023, 1:38 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
615–616

Thanks, I adjusted the comment, made it const and moved the trivial definition here as well.

2876–2877

Yep, argument has been removed.

3259

They'll get expanded in the same block that has the loop iteration check. I updated the comment.

7708

Updated, thanks!

7709

Adjusted, thanks! Unfortunately this requires removing the recipes in the preheader block before executing the plan for the vector epilogue; it has to re-use the expanded values from the main vector loop.

8894–8895

Extended the comment, thanks!

8904

I updated the code to have the preheader separate (pre-header before skeleton creation). I *think* at the moment the skeleton creation code leaves the CFG in a valid state, so for now SCEV expansions only needed by the vector body can remain where they are at the moment.

9109

moved the code to add the trip count right after plan construction.

9119

Removed, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
2315–2316

replaced with an assert

Ayal added inline comments.Apr 24 2023, 1:17 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
621

nit: can do w/o setTripCount()? TripCount should conceptually be fixed before/when building VPlan and remain invariant when transforming it.

3260–3262

nit: add its name "(old) preheader"? Note another block called "new preheader" below.

7705

nit: State should be defined before 0, its PrevBB and insertion point set after checking if preheader is not empty right before executing it.

8740–8741

nit: VecPreheader?

8894–8902
8904

Better place all SCEV expansions at preheader to ensure their correct generation before any CFG change, and potential reuse by epilog vector loop? It may incur an overhead when the vector loop is bypassed - but some subsequent pass may sink the expanded code into VecPreheader - if not reused by epilog vector loop. Otherwise how to set the insertion block of ExpandSCEV recipes needs to be explained.

8908

nit: perhaps Preheader should appear before VecPreheader, and maybe TripCount before both?

nit: would be good to outline this low-level part if reasonable as method is exceeding 250 LOC.

9091

nit (independent of this patch): assert should have an error message.

9094

nit: first create Trip Count, then Preheader, then define Plan given both - which should ideally be set at VPlan construction time w/o being (re)set later. HCFGBuilder will then set VecPreheader.
Can share outlined code with tryToBuildVPlanWithVPRecipes()?

10406

This erasure empties the preheader thereby preventing re-execution of its recipes, and is probably responsible for checking if preheader is empty upon execution.

How do the users of the erased TripCount ExpandSCEV recipe get to re-use the (already) expanded values from the main vector loop, now that the recipe-VPValue feeding them is being deleted?

llvm/lib/Transforms/Vectorize/VPlan.cpp
596–597

Where is TripCount deleted?
Where is Preheader deleted?

llvm/lib/Transforms/Vectorize/VPlan.h
2220–2225

/// ...

2267

/// ... \p Entry ..

2269

nits: Entry >> VecPreheader? Should Preheader and TripCount drop their defaults and asserted to be non-null?

2314

nit: can be dropped? Is Preheader expected to change?

2315–2316

nit: can add a const version, is it useful now?

2320

nit: can be dropped? Is TripCount expected to change?

2775–2776

/// <How does one decide what \p Block should be>
In all cases, it is set to Plan.getEntry(), because ExpandSCEV for TripCount (placed in Preheader) is created directly rather through this API.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
34 ↗(On Diff #516101)

Should this still be (Plan->getEntry())?

400 ↗(On Diff #516101)

Is this additional check needed - can PredVPBB be Plan.getEntry()?

llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
161 ↗(On Diff #516101)

nit: are these OFFSET_IDX >> INDEX# changes needed?

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
866 ↗(On Diff #516101)

nit: are these UGLY >> SCEV changes needed?

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll
900

Nice LSR-type phi redundancy elimination as a by-product?

Ayal added inline comments.Apr 24 2023, 2:15 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
621

Ah, this is needed for setting the IR Value TripCount of ILV, for main loop only, during LVP::executePlan() after ILV's construction.
Deserves documentation.

fhahn updated this revision to Diff 518334.Apr 30 2023, 8:50 AM
fhahn marked 24 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3260–3262

Done, thanks!

fhahn added inline comments.Apr 30 2023, 8:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
500

The latest version has been updated to not pass them.

621

Yes, added a comment, thanks!

7705

Adjusted, thanks!

8740–8741

Renamed in 4583d7ef7c33, thanks!

8894–8902

Thanks, updated!

8904

I've retained to placement of the expansion only needed in the vector loop region in the vector preheader for now, to avoid extra test changes.

8908

nit: perhaps Preheader should appear before VecPreheader, and maybe TripCount before both?

Adjusted, thanks!

nit: would be good to outline this low-level part if reasonable as method is exceeding 250 LOC.

Agreed, but I think it would be good to do this as follow-up and also trying to unify this part with the native path.

9094

moved to a helper, thanks!

10406

How do the users of the erased TripCount ExpandSCEV recipe get to re-use the (already) expanded values from the main vector loop, now that the recipe-VPValue feeding them is being deleted?

The only users should be through ILV.TripCount. Updated the comment and add assert that the removed values don't have any users.

Down the road we really need to find a way to gradually transition the epilogue handling into a single VPlan, so we can avoid all the workarounds here...

llvm/lib/Transforms/Vectorize/VPlan.cpp
596–597

If they are defined by a recipe, they will be deleted as part of recipe deletion; if they are live-in VPValues, they will be cleaned up as part of that cleanup. No special handling should be necessary.

llvm/lib/Transforms/Vectorize/VPlan.h
2220–2225

Added comment, thanks

2267

Added comment, thanks!

2269

Updated to require all arguments. I also added a version that doesn't take any arguments as this is used across different unit tests.

2314

it's not needed in the latest version, removed!

2315–2316

A const version alone is sufficient, added const, thanks!

2320

Removed in the latest version, thanks!

2775–2776

In the latest version, it is either the vector pre-header (VPlan.getEntry()) or the prehader (getPreheader()). This avoids causing test changes due to different expansion points.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
34 ↗(On Diff #516101)

Yep, updated.

400 ↗(On Diff #516101)

Restored original code, it's not needed with the latest version.

llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
161 ↗(On Diff #516101)

Not needed, undone!

llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
90 ↗(On Diff #516101)

Not needed, undone!

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
866 ↗(On Diff #516101)

Not needed, undone!

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll
900

Yes it looks like it!

fhahn updated this revision to Diff 518893.May 2 2023, 3:55 PM

Add VPlan printing of preheader block and update most printing tests. I also updated the initial VPlan construction to use getOrExpandSCEVRecipe to avoid introducing unncessary VPExpandSCEVRecipes. As a consequence some places needed to be updated to specifically requests Part & Lane 0 to handle live-ins properly in VPTransformState::get.

I also updated the code to always place VPExpandSCEVRecipes in the preheader. The only workaround needed for the epilogue vector loop is re-using the trip count expanded for the main loop, because otherwise dominance is broken when used during skeleton creation/fixup.

Ayal added inline comments.May 3 2023, 1:24 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
571

nit: also worth clarifying that the same trip count - that of the original loop - is (generated once and) set for both ILV's of main and epilog loops?

3261

nit: "SCEVs in Plans [preheader] are expanded here" >> "all SCEV expansions needed by vectorized loops"?

3361–3362

Curious about this needed change in handling invariant "live-ins"?

As a consequence some places needed to be updated to specifically requests Part & Lane 0 to handle live-ins properly in VPTransformState::get.

7695

How/do we prevent expanding trip count SCEV again needlessly when vectorizing epilog loop?

Other SCEV expands in preheader are connected directly to distinct VPUsers so need be expanded even if common to both main and epilog loop, hopefully cse'd later?

7698

nit: worth a comment that the 'else' case refers to vectorizing the epilog loop, whose ILV borrows the trip count from that of the main loop's ILV (the latter being set here)?

8853

nit: there is a bit of chicken-and-egg when defining TripCount and Plan. Can let Plan's constructor create its own TripCount - it should be immutable/unsettable.

9818–9819

nits: "... and VPExpandSCEVRecipes". "can" refers only to WidenIntOrFpInductions?

10409

nit: worth commenting about setting trip count explicitly here/now.

llvm/lib/Transforms/Vectorize/VPlan.cpp
1113

nit: have assignSlots(VPBB)

1136–1138

nit: where were the caching get/addSCEVExpansion() added? (Independent of this patch).

1144–1145

nit: can cast instead of getting defined recipe.

llvm/lib/Transforms/Vectorize/VPlan.h
2222

nit: "... during skeleton creation and (rest of) VPlan execution."

2269

Oh, would this empty constructor prevent the creation of TripCount at construction rather than setting it later?

2272

nit: can add that (currently) they are disconnected because bypass blocks between them are not (yet) modelled in VPlan.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
17

Is vp<%0> used, does it match Live-in VEC_TC noted above?

fhahn updated this revision to Diff 519027.May 3 2023, 3:51 AM
fhahn marked 13 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
571

added, thanks!

3261

Added an extra sentence that eventually all SCEV expansion should happen there, as this is not the case yet.

3361–3362

VPTransformState::get with part only for a live-in will try to create a vector broadcast of the live-in because the part-only get in general produces vectors as result. Here we need the scalar value, and requesting a part 0/lane 0 allows us to do that with the current API.

7695

How/do we prevent expanding trip count SCEV again needlessly when vectorizing epilog loop?

Updated the patch to remove the trip count expansion recipe in the epilogue plan together with an assertion that it's not used.

Other SCEV expands in preheader are connected directly to distinct VPUsers so need be expanded even if common to both main and epilog loop, hopefully cse'd later?

Yes. We could transfer the values for the SCECV expansions from the state of the main vector loop to the state of the epilogue vector loop to re-use them without expanding them again. But if we decide to do that it would probably be better to do that separately.

7698

Added an assert to the else case, thanks!

8853

I updated the patch to make createInitialVPlan a static function of VPlan and removed the setter.

9818–9819

Added, thanks!

10409

Added a comment here together with removing the expand-recipe if there's one to avoid expanding the SCEV again. + Assert that it is not used in the plan.

llvm/lib/Transforms/Vectorize/VPlan.cpp
1113

Updated, thanks!

1136–1138

The caching has been added in this pending patch: D147963. But if the current patch gets approved first, it will be easy to apply it without those changes.

Will address those separate.y

1144–1145

Will address separately.

llvm/lib/Transforms/Vectorize/VPlan.h
2222

Extended, thanks!

2272

added + assert that preheader is disconnected.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
17

It should match the scalar trip count. Previously there have been no users so the live-in wasn't printed. I updated ::print() to always print the trip count as it is now always (implicitly) used during skeleton creation.

Ayal accepted this revision.May 3 2023, 7:43 AM

Looks good to me, ship it!
Summary should be updated a bit.

llvm/lib/Transforms/Vectorize/VPlan.cpp
603

nit: redundant empty line

606–608
1118

nit: auto? (Independent of this patch)

1144–1145

All ExpandSCEVRecipes are placed in preheader, rather than vector preheader / entry, right?

llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll
27

Note the hoist of sext from vector.ph to entry/ph, hopefully to be sunk later.

This revision is now accepted and ready to land.May 3 2023, 7:43 AM
fhahn updated this revision to Diff 519453.May 4 2023, 5:37 AM
fhahn marked 5 inline comments as done.

Address remaining comments and rebase before landing.

llvm/lib/Transforms/Vectorize/VPlan.cpp
603

Removed, thanks!

606–608

Simplified in the committed version, thanks!

1144–1145

Yep!

llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll
27

Yes, the backend at the latest should be able to sink it.

This revision was landed with ongoing or failed builds.May 4 2023, 6:00 AM
This revision was automatically updated to reflect the committed changes.
Ayal added inline comments.May 4 2023, 6:43 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3261

(post-commit nit): so have we reached this "Eventually" situation?

llvm/lib/Transforms/Vectorize/VPlan.cpp
1144–1145

(post-commit nit): worth updating documented figure, see above.

fhahn marked an inline comment as done.May 4 2023, 6:51 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3261

There are some places still remaining that do manual SCEV expansion, e.g. when creating induction resume values. This should be addressed by D147965.

Just a heads up that this change most likely breaks Chrome when building with thinlto. The error happens at linking so it is hard for me to get a reproducer.
Error message if it can provide a hint:

Instruction does not dominate all uses!
  %726 = sub i32 0, %559, !dbg !531
  %766 = insertelement <8 x i32> poison, i32 %726, i64 0, !dbg !531
Instruction does not dominate all uses!
  %726 = sub i32 0, %559, !dbg !531
  %770 = mul i32 %726, 8, !dbg !531
ld.lld: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10495: bool llvm::LoopVectorizePass::processLoop(Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.

The changelog from last passing to failing is:

$ git log --oneline a91cb9ce39dc42e6a7a2c4fe97580e51eb1c2961..30af2fb33ed2f610abfa50e53df9712887b2bd25 -- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
e3afe0b89de5 [VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
b85a402dd899 [VPlan] Introduce new entry block to VPlan for early SCEV expansion.
79692750d250 [LV] Use VPValue for SCEV expansion in fixupIVUsers.
5362a0d859d8 [LCSSA] Remove unused ScalarEvolution argument (NFC)
6303fa369c85 [VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)

Here is the crash repro. Because of many dependent changes, I am not able to do a working revert so please try to fix this asap:

clang -cc1 -target-cpu skylake -target-feature -ggnu-pubnames -O3 -ftrivial-auto-var-init=zero -vectorize-loops -emit-llvm -x c crash_vector.c

typedef short a;
typedef int b;
typedef a c;
typedef b d;
typedef c e;
typedef d f;
typedef struct {
  e g[2]
} h;
void i(h *j, e k[], int l) {
  int m, n;
  f o, p;
  n = -j->g[0];
  for (; m < l; m++) {
    p += n;
    o = p;
    k[m] = o;
  }
}

We also see the same error introduced by this patch. A reproducer:

$> opt -passes='loop-vectorize' -epilogue-vectorization-force-VF=4 test.ll

$> cat test.ll
target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

define i32 @test(i32 %0) {
entry:
  br label %loop

loop:                                      ; preds = %loop, %entry
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %t = phi i32 [ 0, %entry ], [ %sub, %loop ]
  %sub = sub i32 %t, %0
  store i32 %t, ptr null, align 4
  %iv.next = add i64 %iv, 1
  %cond = icmp eq i64 %iv.next, 0
  br i1 %cond, label %end, label %loop

end:                                       ; preds = %loop
  ret i32 0
}
fhahn marked an inline comment as done.May 11 2023, 2:03 PM

Thanks for the reproducers. This was an issue with epilogue vectorization and I pushed a workaround for now: 3d4eed01338d. I'll work on migrating the code to generate induction resume values based on VPlan ASAP so we can remove the workaround.

Thanks for the reproducers. This was an issue with epilogue vectorization and I pushed a workaround for now: 3d4eed01338d. I'll work on migrating the code to generate induction resume values based on VPlan ASAP so we can remove the workaround.

Thanks @fhahn for the quick response!