This is an archive of the discontinued LLVM Phabricator instance.

[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.
ClosedPublic

Authored by fhahn on Aug 16 2021, 2:09 AM.

Details

Summary

Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

int sum(std::vector<int> *A, std::vector<int> *B, int N) {
  int cost = 0;
  for (int i = 0; i < N; ++i)
    cost += A->at(i) + B->at(i);
  return cost;
}

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Diff Detail

Event Timeline

fhahn created this revision.Aug 16 2021, 2:09 AM
fhahn requested review of this revision.Aug 16 2021, 2:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 2:09 AM

Can you please also at least add a test with

unreachable.exit.foo: ; preds = <...>
  call void @foo()
  br label %unreachable.exit.end

unreachable.exit.bar: ; preds = <...>
  call void @bar()
  br label %unreachable.exit.end

unreachable.exit.end:
  call void @baz()
  unreachable
nikic added a reviewer: nikic.Aug 16 2021, 2:45 AM
fhahn added a comment.Aug 16 2021, 3:56 AM

Can you please also at least add a test with

unreachable.exit.foo: ; preds = <...>
  call void @foo()
  br label %unreachable.exit.end

unreachable.exit.bar: ; preds = <...>
  call void @bar()
  br label %unreachable.exit.end

unreachable.exit.end:
  call void @baz()
  unreachable

Thanks, should be added in 38c3cebd7d5a. Note that the current version doesn't support that case at the moment, as this would probably require a bit more complex checking.

lebedev.ri added inline comments.Aug 16 2021, 2:24 PM
llvm/lib/Transforms/Utils/LoopPeel.cpp
103–109

(As per brief disscussion in IRC) since this is about loop reentry, how about something like this then?

reames accepted this revision.Aug 17 2021, 9:13 AM

LGTM w/comment suggested.

llvm/lib/Transforms/Utils/LoopPeel.cpp
105

Can you add something to this comment to indicate that a) this is a proxy for a strong profile prediction of untaken, and b) this is a profitability check not a legality check?

This revision is now accepted and ready to land.Aug 17 2021, 9:13 AM
nikic added inline comments.Aug 17 2021, 9:47 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
105

I would really appreciate an explanation of why we want this heuristic... it's not really obvious to me why this is only beneficial if the other exits are likely not taken.

reames added inline comments.Aug 17 2021, 9:53 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
105

I don't know this is Florian's answer, but mine would be: incrementalism and minimizing blast radius. We may eventually want to generalize further, but one step at a time.

skatkov added inline comments.Aug 22 2021, 9:15 PM
llvm/lib/Transforms/Utils/LoopPeel.cpp
105

I guess the problem is that LoopPeeling cannot update branch weights for other branches than latch.

If we can teach to LoopPeeling to update branch weights for non latch branches - we may remove this restriction however it does not look easy task.

Branch weights to deopt and unreachable is overwritten to be smallest one, so its update is simple (no update).

This is how I remember a problem.

fhahn added inline comments.Aug 23 2021, 8:06 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
103–109

@lebedev.ri I agree that this relaxation would be fine/good for the existing cost heuristics, but as @skatov mentioned, the problem with that case would be updating the branch weights. Unless you have additional concerns, I am planning on going with unreachable-terminated exits for now to avoid messing up branch weights. WDYT?

105

Thanks for taking a look! My main motivation was as Philip suggested mostly to keep the potential fallout limited and additionally allowing exit blocks terminated by unreachable should not negatively impact the existing heuristics.

@skatov thanks for providing the extra context with respect to branch weights, I'll include that in the updated comment.

I see. Thanks, SGTM, looking forward to this, and further improvements!

fhahn updated this revision to Diff 368600.Aug 25 2021, 3:58 AM

rebased and comment added. I'll land this shortly.

This revision was landed with ongoing or failed builds.Aug 25 2021, 5:27 AM
This revision was automatically updated to reflect the committed changes.