This is an archive of the discontinued LLVM Phabricator instance.

[LV] Try to sink and hoist inside candidate loops for vectorization.
AbandonedPublic

Authored by fhahn on Apr 26 2021, 5:50 AM.

Details

Summary

Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:

  1. Executing memory accesses unconditionally.
  2. Reducing the number of instructions that need predication.

After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.

This patch tires to recover the regression by running hoisting/sinking
inside each inner loop before vectorization. This is not ideal, because
we also hoist/sink in loops that won't be vectorized. But LV already
does similar transformations for all inner loops (e.g LoopSimplify and
LCSSA construction). Alternatively we could run a separate
loop-sink-hoist pass, but I am not sure that's worth the effort.

In the long term, the sinking/hoisting could and should be done in
VPlan, but it requires at least handling parts of legaltiy and
cost-modeling in VPlan as well.

Details about the impact on compile-time can be found here:
http://llvm-compile-time-tracker.com/compare.php?from=3a71d0de397e3a15c943ca59a00243ba8b7154da&to=c4efd69f4733b46e5de8fc2fa6e4c2495750d339&stat=instructions

NewPM-O3: geoman +0.18%
NewPM-ReleaseThinLTO: geoman +0.17%
NewPM-ReleaseLTO-g: geoman +0.18%

In terms of number of loops vectorized, we have the following changes
across MultiSource/SPEC2000/SPEC2006 on X86 with LTO

test-suite...000/186.crafty/186.crafty.test 20.00 22.00 10.0%
test-suite...006/450.soplex/450.soplex.test 85.00 86.00 1.2%
test-suite.../CINT2006/403.gcc/403.gcc.test 209.00 211.00 1.0%
test-suite...6/464.h264ref/464.h264ref.test 156.00 157.00 0.6%
test-suite...ications/JM/lencod/lencod.test 215.00 216.00 0.5%

And +0.5% more loops are vectorized in Geekbench on AArch64.

Diff Detail

Event Timeline

fhahn created this revision.Apr 26 2021, 5:50 AM
fhahn requested review of this revision.Apr 26 2021, 5:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2021, 5:50 AM
Herald added a subscriber: vkmr. · View Herald Transcript
fhahn retitled this revision from [lV] Try to sink and hoist inside candidate loops for vectorization. to [LV] Try to sink and hoist inside candidate loops for vectorization..
fhahn updated this revision to Diff 340555.Apr 26 2021, 9:19 AM

Avoid using applyUpdatesPermissive, by directly handling the case with duplicated successors.

fhahn updated this revision to Diff 340557.Apr 26 2021, 9:21 AM

remove accidental SimplifyCFG changes, they are now in D101289

Harbormaster completed remote builds in B100960: Diff 340557.

As far as i can tell, this doesn't reintroduce the problem that was fixed by disabling early hoisting in simplifycfg, so LG.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9989

FWIW these sink/hoist helpers are fine with lazy updates.

10006–10016

I think it would be fine to sink that into hoistThenElseCodeToIf().

10025–10031

I think it would be fine to sink that into sinkCommonCodeFromPredecessors().

fhahn updated this revision to Diff 340850.Apr 27 2021, 8:01 AM

Rebased on top of D101368, break out removing blocks that be come dead as separate cleanup in the loop.

fhahn added inline comments.Apr 27 2021, 8:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9989

I'm not sure what the issue is exactly, but I think there are some cases where combining sinking & hoisting could result in some duplicated/outdated updates which would require applyUpdatesPermissive

10006–10016

Unfortunately I think that would be incompatible with the use in SimplifyCFG, which does not seem to handle removing BBs other than the current one well.

But I adjusted the loop to remove any blocks that became dead at the top-level, which hopefully makes things a bit clearer.

10025–10031

Thanks, that's a great idea. I put up D101368 to optionally preserve LI in sinkCommonCodeFromPredecessors

fhahn updated this revision to Diff 340881.Apr 27 2021, 9:33 AM

Pass DTU to DeleteDeadBlock, use lazy update strategy.

lebedev.ri accepted this revision.EditedApr 27 2021, 10:28 AM

Cautious LG, you likely want one more review.
Thanks.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9998

pred_empty(BB)

10006–10016

Hmm, true. Disregard then.

10008–10011
10015
This revision is now accepted and ready to land.Apr 27 2021, 10:28 AM
nikic added a subscriber: nikic.Apr 27 2021, 12:42 PM

I find it pretty odd that LoopVectorize would have this kind of side effect on code that's not vectorized. I don't think that's really the same as construction of LoopSimplify/LCSSA form, which is pretty much par the course for any loop pass. D84108 mentions that:

I'm proposing to harmonize this, and disable common code hoisting until late
in pipeline. Definition of late may vary, currently i've picked the same one
as for code sinking, but i suppose we could enable it as soon as right after
loop rotation happens.

I think that doing that (enabling hoisting in some post-LoopRotate SimplifyCFG run) would be cleaner than having it happen as a side-effect of LoopVectorize.

fhahn added a comment.Apr 28 2021, 9:23 AM

I find it pretty odd that LoopVectorize would have this kind of side effect on code that's not vectorized. I don't think that's really the same as construction of LoopSimplify/LCSSA form, which is pretty much par the course for any loop pass. D84108 mentions that:

I'm proposing to harmonize this, and disable common code hoisting until late
in pipeline. Definition of late may vary, currently i've picked the same one
as for code sinking, but i suppose we could enable it as soon as right after
loop rotation happens.

I think that doing that (enabling hoisting in some post-LoopRotate SimplifyCFG run) would be cleaner than having it happen as a side-effect of LoopVectorize.

That's a fair point. Originally I did not adjust the SimplifyCFG runs, because we do a run of LoopRotate just before LoopVectorize in the legacy pass manager and the sinking/hoisting before may pessimize LoopRotate there. But it may not be a big issue (@lebedev.ri WDYT?), because sinking/hoisting before the second LoopRotate run should outweigh the risks of not rotating.

In general it is a bit unfortunate that LV is so sensitive to slight changes in the IR and very dependent on the specific position in the pipeline. As such, making it a bit more independent is desirable goal in general. In the future, we should be able to do the hoisting/sinking in VPlan, but that's still some way off. Unfortunately we cannot really perform the hoisting on loop that we can vectorize, because it would mess up runtime check SCEVs and a bunch of other things. Doing it in LV should disturb other passes less, but sinking/hoisting earlier is probably beneficial for a bunch of other passes as well.

I put up D101468 as an alternative, but should fix that particular vectorization issues when using O2/O3.

lebedev.ri requested changes to this revision.Apr 29 2021, 4:36 AM

Superseded by D101468

This revision now requires changes to proceed.Apr 29 2021, 4:36 AM

The alternative solution landed some time ago.
Abandon this?

fhahn abandoned this revision.Jun 24 2021, 7:08 AM

The alternative solution landed some time ago.
Abandon this?

Yes, this is not needed for now. I'll also abandon the related patches.