This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrollRuntime] Allow simple transition to deopt non-latch exit blocks
ClosedPublic

Authored by yrouban on Jun 10 2023, 9:15 PM.

Details

Summary

Relax condition on runtime trip count unrolling loops with 1 non-latch exit that leads to a deop block.

There are cases when the deopt blocks are common exits for different loops. LoopSimplify pass splits such edges to the common deopting blocks to make sure that all exit nodes of the loop only have predecessors that are inside of the loop (See simplifyOneLoop()). This breaks the current condition for unrolling.

This patch calls getPostdominatingDeoptimizeCall() instead of getTerminatingDeoptimizeCall().

Here is previous heuristic change https://reviews.llvm.org/D35380.

Diff Detail

Event Timeline

yrouban created this revision.Jun 10 2023, 9:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2023, 9:15 PM
yrouban requested review of this revision.Jun 10 2023, 9:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2023, 9:15 PM
yrouban edited the summary of this revision. (Show Details)Jun 10 2023, 10:04 PM
yrouban added reviewers: reames, apilipenko.
anna added a comment.EditedJun 12 2023, 1:52 PM

If we are traversing through unique successors until we reach deopt, we're guaranteed it is a cold path. Did we get any compile time difference if we choose this heuristic versus directly using getPostdominatingDeoptimizeCall()?
When I originally had the heuristic, there wasn't any particular reason for checking immediate deopt block (except that those were the workloads we cared about).

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
457

Nit: GetTerminatingDeoptimizeCall

yrouban updated this revision to Diff 530762.Jun 12 2023, 9:14 PM
yrouban marked an inline comment as done.

fixed a typo

If we are traversing through unique successors until we reach deopt, we're guaranteed it is a cold path. Did we get any compile time difference if we choose this heuristic versus directly using getPostdominatingDeoptimizeCall()?
When I originally had the heuristic, there wasn't any particular reason for checking immediate deopt block (except that those were the workloads we cared about).

I agree that relaxing the unroll condition we might end up with a longer compile time (CT), but this change has nothing to do with CT (e.g. the longer chain of transitive blocks does not contribute to a bigger IR after unrolling). If we will have a longer CT with this change it could be so with the original code. I believe that if we want to limit impact of LoopUnroll on CT we have to impose a cost model to estimate the effect of unrolling - that is what said in the TODO right after the return statement. So, I will if there will be found any CT increase.

anna added a comment.Jun 13 2023, 8:41 AM

If we are traversing through unique successors until we reach deopt, we're guaranteed it is a cold path. Did we get any compile time difference if we choose this heuristic versus directly using getPostdominatingDeoptimizeCall()?
When I originally had the heuristic, there wasn't any particular reason for checking immediate deopt block (except that those were the workloads we cared about).

I agree that relaxing the unroll condition we might end up with a longer compile time (CT), but this change has nothing to do with CT (e.g. the longer chain of transitive blocks does not contribute to a bigger IR after unrolling). If we will have a longer CT with this change it could be so with the original code. I believe that if we want to limit impact of LoopUnroll on CT we have to impose a cost model to estimate the effect of unrolling - that is what said in the TODO right after the return statement. So, I will if there will be found any CT increase.

I was actually talking about making it even more permissive, which will cause an even more increase in CT compared to what you have in the current patch (because getPostdominatingDeoptimizeCall does not consider just phi nodes, it considers all blocks). It would unroll more cases, but potential increase in CT (again only for code that uses deopt).

yrouban updated this revision to Diff 531270.Jun 14 2023, 5:14 AM
yrouban edited the summary of this revision. (Show Details)

Relaxed the condition further by calling getPostdominatingDeoptimizeCall().

anna accepted this revision.Jun 14 2023, 6:04 AM

LGTM. Pls wait for a day or so if anyone else has comments.

This revision is now accepted and ready to land.Jun 14 2023, 6:04 AM
This revision was landed with ongoing or failed builds.Jun 18 2023, 9:11 PM
This revision was automatically updated to reflect the committed changes.