Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
ClosedPublic

Authored by fhahn on Apr 10 2023, 12:46 PM.

Details

Summary

This patch adds a new entry block the VPlan to place early SCEV
expansions like the trip count.

It updates createVectorizedLoopSkeleton to take a VPlan & State as
arguments, which can then be used to access the trip count. This is a
first step towards modeling skeleton construction in VPlan and a start
of gradually moving parts of createVectorizedLoopSkeleton towards VPlan.

At the moment, the new preheader is executed before
createVectorizedLoopSkeleton is called, and the rest afterwards.

I will share a follow-up patch that also expands step expressions early
to fix #58811.

Still WIP as some tests need to be updated.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added inline comments.Apr 30 2023, 8:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
500

The latest version has been updated to not pass them.

621

Yes, added a comment, thanks!

7705

Adjusted, thanks!

8740–8741

Renamed in 4583d7ef7c33, thanks!

8894–8902

Thanks, updated!

8904

I've retained to placement of the expansion only needed in the vector loop region in the vector preheader for now, to avoid extra test changes.

8908

nit: perhaps Preheader should appear before VecPreheader, and maybe TripCount before both?

Adjusted, thanks!

nit: would be good to outline this low-level part if reasonable as method is exceeding 250 LOC.

Agreed, but I think it would be good to do this as follow-up and also trying to unify this part with the native path.

9094

moved to a helper, thanks!

10406

How do the users of the erased TripCount ExpandSCEV recipe get to re-use the (already) expanded values from the main vector loop, now that the recipe-VPValue feeding them is being deleted?

The only users should be through ILV.TripCount. Updated the comment and add assert that the removed values don't have any users.

Down the road we really need to find a way to gradually transition the epilogue handling into a single VPlan, so we can avoid all the workarounds here...

llvm/lib/Transforms/Vectorize/VPlan.cpp
596–597

If they are defined by a recipe, they will be deleted as part of recipe deletion; if they are live-in VPValues, they will be cleaned up as part of that cleanup. No special handling should be necessary.

llvm/lib/Transforms/Vectorize/VPlan.h
2220–2225

Added comment, thanks

2267

Added comment, thanks!

2269

Updated to require all arguments. I also added a version that doesn't take any arguments as this is used across different unit tests.

2314

it's not needed in the latest version, removed!

2315–2316

A const version alone is sufficient, added const, thanks!

2320

Removed in the latest version, thanks!

2775–2776

In the latest version, it is either the vector pre-header (VPlan.getEntry()) or the prehader (getPreheader()). This avoids causing test changes due to different expansion points.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
34 ↗(On Diff #516101)

Yep, updated.

400 ↗(On Diff #516101)

Restored original code, it's not needed with the latest version.

llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
161 ↗(On Diff #516101)

Not needed, undone!

llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll
90 ↗(On Diff #516101)

Not needed, undone!

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
866 ↗(On Diff #516101)

Not needed, undone!

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll
900

Yes it looks like it!

fhahn updated this revision to Diff 518893.May 2 2023, 3:55 PM

Add VPlan printing of preheader block and update most printing tests. I also updated the initial VPlan construction to use getOrExpandSCEVRecipe to avoid introducing unncessary VPExpandSCEVRecipes. As a consequence some places needed to be updated to specifically requests Part & Lane 0 to handle live-ins properly in VPTransformState::get.

I also updated the code to always place VPExpandSCEVRecipes in the preheader. The only workaround needed for the epilogue vector loop is re-using the trip count expanded for the main loop, because otherwise dominance is broken when used during skeleton creation/fixup.

Ayal added inline comments.May 3 2023, 1:24 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
571

nit: also worth clarifying that the same trip count - that of the original loop - is (generated once and) set for both ILV's of main and epilog loops?

3261

nit: "SCEVs in Plans [preheader] are expanded here" >> "all SCEV expansions needed by vectorized loops"?

3361–3362

Curious about this needed change in handling invariant "live-ins"?

As a consequence some places needed to be updated to specifically requests Part & Lane 0 to handle live-ins properly in VPTransformState::get.

7695

How/do we prevent expanding trip count SCEV again needlessly when vectorizing epilog loop?

Other SCEV expands in preheader are connected directly to distinct VPUsers so need be expanded even if common to both main and epilog loop, hopefully cse'd later?

7698

nit: worth a comment that the 'else' case refers to vectorizing the epilog loop, whose ILV borrows the trip count from that of the main loop's ILV (the latter being set here)?

8853

nit: there is a bit of chicken-and-egg when defining TripCount and Plan. Can let Plan's constructor create its own TripCount - it should be immutable/unsettable.

9818–9819

nits: "... and VPExpandSCEVRecipes". "can" refers only to WidenIntOrFpInductions?

10409

nit: worth commenting about setting trip count explicitly here/now.

llvm/lib/Transforms/Vectorize/VPlan.cpp
1113

nit: have assignSlots(VPBB)

1136–1138

nit: where were the caching get/addSCEVExpansion() added? (Independent of this patch).

1144–1145

nit: can cast instead of getting defined recipe.

llvm/lib/Transforms/Vectorize/VPlan.h
2222

nit: "... during skeleton creation and (rest of) VPlan execution."

2269

Oh, would this empty constructor prevent the creation of TripCount at construction rather than setting it later?

2272

nit: can add that (currently) they are disconnected because bypass blocks between them are not (yet) modelled in VPlan.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
17

Is vp<%0> used, does it match Live-in VEC_TC noted above?

fhahn updated this revision to Diff 519027.May 3 2023, 3:51 AM
fhahn marked 13 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
571

added, thanks!

3261

Added an extra sentence that eventually all SCEV expansion should happen there, as this is not the case yet.

3361–3362

VPTransformState::get with part only for a live-in will try to create a vector broadcast of the live-in because the part-only get in general produces vectors as result. Here we need the scalar value, and requesting a part 0/lane 0 allows us to do that with the current API.

7695

How/do we prevent expanding trip count SCEV again needlessly when vectorizing epilog loop?

Updated the patch to remove the trip count expansion recipe in the epilogue plan together with an assertion that it's not used.

Other SCEV expands in preheader are connected directly to distinct VPUsers so need be expanded even if common to both main and epilog loop, hopefully cse'd later?

Yes. We could transfer the values for the SCECV expansions from the state of the main vector loop to the state of the epilogue vector loop to re-use them without expanding them again. But if we decide to do that it would probably be better to do that separately.

7698

Added an assert to the else case, thanks!

8853

I updated the patch to make createInitialVPlan a static function of VPlan and removed the setter.

9818–9819

Added, thanks!

10409

Added a comment here together with removing the expand-recipe if there's one to avoid expanding the SCEV again. + Assert that it is not used in the plan.

llvm/lib/Transforms/Vectorize/VPlan.cpp
1113

Updated, thanks!

1136–1138

The caching has been added in this pending patch: D147963. But if the current patch gets approved first, it will be easy to apply it without those changes.

Will address those separate.y

1144–1145

Will address separately.

llvm/lib/Transforms/Vectorize/VPlan.h
2222

Extended, thanks!

2272

added + assert that preheader is disconnected.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
17

It should match the scalar trip count. Previously there have been no users so the live-in wasn't printed. I updated ::print() to always print the trip count as it is now always (implicitly) used during skeleton creation.

Ayal accepted this revision.May 3 2023, 7:43 AM

Looks good to me, ship it!
Summary should be updated a bit.

llvm/lib/Transforms/Vectorize/VPlan.cpp
603

nit: redundant empty line

606–608
1118

nit: auto? (Independent of this patch)

1144–1145

All ExpandSCEVRecipes are placed in preheader, rather than vector preheader / entry, right?

llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll
27

Note the hoist of sext from vector.ph to entry/ph, hopefully to be sunk later.

This revision is now accepted and ready to land.May 3 2023, 7:43 AM
fhahn updated this revision to Diff 519453.May 4 2023, 5:37 AM
fhahn marked 5 inline comments as done.

Address remaining comments and rebase before landing.

llvm/lib/Transforms/Vectorize/VPlan.cpp
603

Removed, thanks!

606–608

Simplified in the committed version, thanks!

1144–1145

Yep!

llvm/test/Transforms/LoopVectorize/pointer-induction-unroll.ll
27

Yes, the backend at the latest should be able to sink it.

This revision was landed with ongoing or failed builds.May 4 2023, 6:00 AM
This revision was automatically updated to reflect the committed changes.
Ayal added inline comments.May 4 2023, 6:43 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3261

(post-commit nit): so have we reached this "Eventually" situation?

llvm/lib/Transforms/Vectorize/VPlan.cpp
1144–1145

(post-commit nit): worth updating documented figure, see above.

fhahn marked an inline comment as done.May 4 2023, 6:51 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3261

There are some places still remaining that do manual SCEV expansion, e.g. when creating induction resume values. This should be addressed by D147965.

Just a heads up that this change most likely breaks Chrome when building with thinlto. The error happens at linking so it is hard for me to get a reproducer.
Error message if it can provide a hint:

Instruction does not dominate all uses!
  %726 = sub i32 0, %559, !dbg !531
  %766 = insertelement <8 x i32> poison, i32 %726, i64 0, !dbg !531
Instruction does not dominate all uses!
  %726 = sub i32 0, %559, !dbg !531
  %770 = mul i32 %726, 8, !dbg !531
ld.lld: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10495: bool llvm::LoopVectorizePass::processLoop(Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.

The changelog from last passing to failing is:

$ git log --oneline a91cb9ce39dc42e6a7a2c4fe97580e51eb1c2961..30af2fb33ed2f610abfa50e53df9712887b2bd25 -- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
e3afe0b89de5 [VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
b85a402dd899 [VPlan] Introduce new entry block to VPlan for early SCEV expansion.
79692750d250 [LV] Use VPValue for SCEV expansion in fixupIVUsers.
5362a0d859d8 [LCSSA] Remove unused ScalarEvolution argument (NFC)
6303fa369c85 [VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)

Here is the crash repro. Because of many dependent changes, I am not able to do a working revert so please try to fix this asap:

clang -cc1 -target-cpu skylake -target-feature -ggnu-pubnames -O3 -ftrivial-auto-var-init=zero -vectorize-loops -emit-llvm -x c crash_vector.c

typedef short a;
typedef int b;
typedef a c;
typedef b d;
typedef c e;
typedef d f;
typedef struct {
  e g[2]
} h;
void i(h *j, e k[], int l) {
  int m, n;
  f o, p;
  n = -j->g[0];
  for (; m < l; m++) {
    p += n;
    o = p;
    k[m] = o;
  }
}

We also see the same error introduced by this patch. A reproducer:

$> opt -passes='loop-vectorize' -epilogue-vectorization-force-VF=4 test.ll

$> cat test.ll
target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

define i32 @test(i32 %0) {
entry:
  br label %loop

loop:                                      ; preds = %loop, %entry
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %t = phi i32 [ 0, %entry ], [ %sub, %loop ]
  %sub = sub i32 %t, %0
  store i32 %t, ptr null, align 4
  %iv.next = add i64 %iv, 1
  %cond = icmp eq i64 %iv.next, 0
  br i1 %cond, label %end, label %loop

end:                                       ; preds = %loop
  ret i32 0
}
fhahn marked an inline comment as done.May 11 2023, 2:03 PM

Thanks for the reproducers. This was an issue with epilogue vectorization and I pushed a workaround for now: 3d4eed01338d. I'll work on migrating the code to generate induction resume values based on VPlan ASAP so we can remove the workaround.

Thanks for the reproducers. This was an issue with epilogue vectorization and I pushed a workaround for now: 3d4eed01338d. I'll work on migrating the code to generate induction resume values based on VPlan ASAP so we can remove the workaround.

Thanks @fhahn for the quick response!