This is an archive of the discontinued LLVM Phabricator instance.

[SpeculateAroundPHIs] Avoid speculation on loop back edges
AbandonedPublic

Authored by bjope on Feb 1 2021, 7:58 AM.

Details

Summary

In https://bugs.llvm.org/show_bug.cgi?id=48821 there are some
discussions around problems with SpeculateAroundPHIs and back-edges
in loops.

One problem observed is that when splitting a critical edge that
is a loop latch, the pass isn't transferring loop metadata to the
new latch (nor removing the old loop metadata). So if we want to
allow speculation for such scenarios we need to fix that.

There have also been observations that (downstream) backends have
had problems with identifying hwloops when allowing speculation
involving back-edges. That might be a target specific problem
that potentially could be resolved by adding a TTI hook, or possibly
by keeping the loop form by also rotating loops after the pass (need
to decide if opt should canonicalize the loop in such manner or if
a backend need to rotate loops in the codegen pipeline).

This also avoid some differences between LegacyPM and NewPM when
it comes to how we optimize loops. So this could make the transition
to the new pass manager more clean, avoiding to change both pass
manager and potentially introducing new phase ordering issues at the
same time.

Diff Detail

Event Timeline

bjope created this revision.Feb 1 2021, 7:58 AM
bjope requested review of this revision.Feb 1 2021, 7:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2021, 7:58 AM

I think the problem with loop metadata not being handled properly is a bug (that also could impact performance negatively) , so this is a quick fix (but I guess it could be possible to handle transition of metadata to the new latch with some effort if the transform is beneficial).

Then there is the question about loop form and what can be expected at the end of the opt pipeline. Should the loop be in simplified/rotated form (I think legacy PM ends with SimplifyCFG, but new PM is running this pass that might unrotate(?) the loop). At least our downstream benchmarks suffered significantly since it failed to detect hwloops when having speculated around a phi involving a latch. If there is no canonical form, then maybe we need to improve codegen to either undo the transform (rotate loops) or improve the hwloop detection somehow.

Maybe I should add a TTI hook to simplify for specific targets to use this workaround (although, the loop metadata problem could possible be messing up for any target)?

Meinersbur added a comment.EditedFeb 19 2021, 10:30 AM

Some passes passes might not handle critical edges, in addition to SpeculateAroundPHIs . Wouldn't it be simpler to fix llvm::SplitCriticalEdge to make it put the loop metadata to the correct branch?

bjope added a comment.Mar 4 2021, 12:39 PM

Some passes passes might not handle critical edges, in addition to SpeculateAroundPHIs . Wouldn't it be simpler to fix llvm::SplitCriticalEdge to make it put the loop metadata to the correct branch?

That could perhaps be a long-term solution, although I'm not familiar with all uses of SplitCriticalEdge to know if it is feasible (I figure that sometimes the metadata should be updated as well, not only being moved).
Other passes, e.g. LoopRotate, is handling loop metadata "manually". But maybe there are other passes using SplitCriticalEdge that also suffer from not updating loop metadata.

In this patch I put focused on getting rid of the "bug" (having loop metadata on the wrong branch should ideally be caught be the verifier but as I recall it isn't that easy to verify).

IIUC, this patch inhibits optimizations opportunities because of a bug elsewhere? Don't we want to fix lost loop metadata anywhere else too?

bjope added a comment.Mar 7 2021, 8:51 AM

IIUC, this patch inhibits optimizations opportunities because of a bug elsewhere?

I haven't understood if the rewrite really is an optimization or not when the critical edge split occur for a backedge. All I see for my target are performance penalties (either by no longer getting an "hwloop", or due to that the rewrite introduces a conditional branch inside the loop body).
This patch both restores performance loss introduced by changing from legacy PM to new PM, and avoids the problem with incorrect loop metadata after such non-profitable rewrites. If someone could motivate the transform (why it should be considered as beneficial), then I agree that the loop metadata problem should be solved instead. And then maybe we also need some TTI hook or something to allow targets to skip the transform if it isn't wanted. Those rewrites at least seem to be devastating for our OOT benchmark suite when switching from legacy PM to new PM.

I will need to take another look, but i also have seen significant perf regressions due to SpeculateAroundPHIs.
I think a better/less controversial fix would be to move SpeculateAroundPHIs to the end of IR portion of codegen pipeline,
after last extension point.

thopre added a subscriber: thopre.Jun 4 2021, 2:40 AM
bjope added a comment.Jun 4 2021, 3:55 AM

I will need to take another look, but i also have seen significant perf regressions due to SpeculateAroundPHIs.
I think a better/less controversial fix would be to move SpeculateAroundPHIs to the end of IR portion of codegen pipeline,
after last extension point.

Moving the pass still doesn't solve the problem that the loop metadata can be messed up (I think that having stale loop metadata on the wrong branch is bad, a couple of transformations later it might end up being on a latch belonging to a totally different loop).
That could ofcourse also be solved by another patch.

I think it was pretty annoying that the switch of pass manager introduce these problems as a side effect. And the idea here is simply to avoid the bugs/regressions seen.
Then someone could try to optimize these patterns again (hopefully in a correct way), or move things around in the pipeline etc.

I find it worrying that the authors of the pass (i.e. google) still haven't found time to reply to the post-commit feedback here / in the original review.

nikic added a subscriber: nikic.EditedJun 11 2021, 1:54 AM

It seems like the addition of this pass has been somewhat botched by adding it only to the NewPM pipeline -- for no clear reason I can discern. This was at a time where barely anyone (outside Google) was using the NewPM, which means that this received very little evaluation from other parties. Now people are hitting this as part of the LegacyPM -> NewPM migration instead, which really shouldn't be the case.

It might make sense to remove this pass from the NewPM default pipeline again (or put it behind an option) and then restart the process of enabling it by default, this time with due diligence.

It seems like the addition of this pass has been somewhat botched by adding it only to the NewPM pipeline -- for no clear reason I can discern. This was at a time where barely anyone (outside Google) was using the NewPM, which means that this received very little evaluation from other parties. Now people are hitting this as part of the LegacyPM -> NewPM migration instead, which really shouldn't be the case.

It might make sense to remove this pass from the NewPM default pipeline again (or put it behind an option) and then restart the process of enabling it by default, this time with due diligence.

Aha, so i'm not the only one with that sentiment. D104099 it is then.

The pass has now been reverted by rGe52364532afb2748c324f360bc1cc12605d314f3.
Abandon this?

bjope abandoned this revision.Jun 16 2021, 12:22 AM

The pass has been removed from the code base, so this can be abandonded.

(Hopefully someone will remember about https://bugs.llvm.org/show_bug.cgi?id=48821 if trying to revive the pass in the future.)