This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when it's profitable
ClosedPublic

Authored by mkazantsev on Oct 19 2022, 1:19 AM.

Details

Summary

Based on https://discourse.llvm.org/t/rfc-inject-invariant-conditions-to-loops-to-enable-unswitching-and-constraint-elimination

This transform attempts to handle the following loop:

for (...) {
  x = <some variant>
  if (x <u C1) {} else break;
  if (x <u C2) {} else break;
}

Here x is some loop-variant value, and C1 and C2 are loop invariants.
As we see, this loop has no invariant checks we can unswitch on. However, there is an
invariant condition that can make the second check redundant. Specifically, it is C1 <=u C2.
We can modify this code in the following way:

for (...) {
  x = <some variant>
  if (x <u C1) {} else break;
  if (C1 <=u C2) {
  /* no check is required */
  }
  else {
    // do the check normally
    if (x <u C2) {} else break;
  }
}

Now we have an invariant condition C1 <=u C2 and can unswitch on it.

This patch introduces the basic version of this transform, with some limitations,
all of them seem liftable (but needs more work & testing):

  • All checks are ult condition;
  • All branches in question stay in loop if the said condition is true and leave it otherwise;
  • All in-loop branches are hot enough;

There is also a room for improvement cost model. So far we evalutate the cost of
unswitching this newly injected invariant branch the same as if we would unswitch
on 2nd condition, which is not exactly precise (but also not grossly wrong).

Diff Detail

Event Timeline

mkazantsev created this revision.Oct 19 2022, 1:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 1:19 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
mkazantsev requested review of this revision.Oct 19 2022, 1:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 1:19 AM
mkazantsev retitled this revision from [WIP] Virtual loop unswitching to [SimpleLoopUnswitch] Insert loop-invariant conditions and unswitch them when it's profitable.
mkazantsev edited the summary of this revision. (Show Details)

Alive2 proof of correctness (on test_04): https://alive2.llvm.org/ce/z/S9sX5h

Put MSSA verification under flag.

A couple of high-level comments.

How does this type of unswitching interact with poison?

I'm not fond of the term "virtual" here as it is very non-descriptive. I don't have a great suggestion, but maybe "unswitch by injected invariants" or something like this?

Looks like you are doing some canonicalization of conditions in this patch: replacing signed with unsigned, handling zexts. I suggest splitting all of this into separate patches so as to make the initial review smaller. Leave the bare minimum transform in the initial patch and then layer complexity with follow up reviews.

How does this type of unswitching interact with poison?

We used to have something like:

  %cmp1 = icmp ult i32 %x, %A
  br i1 cmp1, label %taken, label %exit

taken:
  %cmp2 = icmp ult i32 %x, %B
  br i1 cmp2, label %taken2, label %exit2

We want to insert a new check A <=u B right before br i1 cmp2. Note that, in the original program, when we execute this branch, and either A or B was poison, then the original program has undefined behavior. So inserting a new comparison and branch by A <=u B doesn't change this fact: it's still UB if either of them is a poison. So the answer to your question is "interaction with poison remains unchanged".

I'm not fond of the term "virtual" here as it is very non-descriptive. I don't have a great suggestion, but maybe "unswitch by injected invariants" or something like this?

Agreed, but let's have more opinions on what name is more suitable here.

Looks like you are doing some canonicalization of conditions in this patch: replacing signed with unsigned, handling zexts. I suggest splitting all of this into separate patches so as to make the initial review smaller.

Makes sense, will do.

mkazantsev retitled this revision from [SimpleLoopUnswitch] Insert loop-invariant conditions and unswitch them when it's profitable to [SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when it's profitable.
mkazantsev edited the summary of this revision. (Show Details)

Updates according to Artur's comments:

  • Got rid of "virtual" terminology
  • Split off handling of different types
apilipenko added inline comments.Dec 7 2022, 6:04 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983

The injected condition doesn't necessarily have the same profile as the original.

I also don't think that we should preserve MD_make_implicit.

3058

Stray change.

3073–3086

Since you are not handling zexts in this patch, this part of the comment can be dropped.

3106–3107

Why do you limit the transform to conditions that dominate the latch?

3116–3118

Why do you need to remember this fact and disable unswitching later?

3136–3137

Can this be done from the loop above?

3340–3342
mkazantsev added inline comments.Dec 8 2022, 9:30 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983

The injected condition doesn't necessarily have the same profile as the original.

If the number of iterations in the loop is the same, then it does. Because this condition is invariant, the frequency ratio shows how often it is true or false.

If the number of iterations in this loop is different from call to call - well, formally, it might not be the same after unswitching. However, it also means that the profile we report here is something collected from cumulative multiple different runs, and is misleading by itself. It could be 0:1 in one run, 1000:0 in another run, and 1000:1 on average. I still think that in this situation we can preserve it, just to denote which loop is more frequent, even if exact numbers won't hold. Having imprecise, but conceptually "this is more frequent than that" numbers is better than having no numbers. Does that make sense?

I also don't think that we should preserve MD_make_implicit.

Why not? I think we can always have this metadata whenever we think that one of branches is super rare. And this unswitching won't change that fact.

3073–3086

Will move to follow-up patch.

mkazantsev added inline comments.Dec 8 2022, 9:30 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3106–3107

Because I prove implications, and the implication can only be proven from something that must execute before the implied condition.

3116–3118

If the invariant condition is false, the initial loop-variant condition cannot be proven true or false. Example:

for (int i = 0; i < N; i++) {
  x = arr[i];
  if (x <0 || x >= 128)
    deopt();
  if (x < 0 || x >= len)
    deopt();
  ...
}

The transform will insert loop-invariant condition 128 <= len to get rid of loop-variant check if (x < 0 || x >= len) in one of unswitched copies.

If we have proven that 128 <= len, then we have proven that 0 <= x < 128 <= len, and therefore in the unswitched version 2nd check can be removed. But if we haven't proven that, we do not automatically know anything about 2nd condition. Example: len = 100 but x is still less than 99.

It means that in the unswitched copy, we should preserve the initial loop-variant condition. And we need to prevent from the optimization go crazy and infinitely unswitch on it.

3136–3137

Yes, but this loop is big already... I wanted to split up the logic to keep code more or less comprehendable.

3340–3342

Yup, thanks for pointing out.

mkazantsev marked 5 inline comments as done.

Rebase & addressed comments

mkazantsev marked 2 inline comments as done.Dec 19 2022, 3:32 AM
apilipenko added inline comments.Jan 9 2023, 9:05 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983

I still think that in this situation we can preserve it, just to denote which loop is more frequent, even if exact numbers won't hold. Having imprecise, but conceptually "this is more frequent than that" numbers is better than having no numbers. Does that make sense?

I would prefer to drop the profile in this case. I suggest removing the preservation of branch weights metadata for now. Not having branch weights is always valid. We can return to this discussion later, once we have the main logic of the optimization checked in.

Why not? I think we can always have this metadata whenever we think that one of branches is super rare. And this unswitching won't change that fact.

The branch needs to be very rare, *and* there should be a way to heal from this optimization.
https://llvm.org/docs/FaultMaps.html#make-implicit-metadata

This is way implicit null check optimization is not driven by profile only.

3106–3107

But what you need in fact is dominance between the two branches, not between the branches and the latch.

BTW, do you need to worry about implicit control flow/must execute property here?

3116–3118

The fact that you are relying on metadata for this is a bit concerning. There might be a pass that doesn't know anything about this metadata and drops it, leading to an undesired unswitching.

You can probably check for the invariant condition you are about to insert in the dominating conditions. If there such a condition is known to be false -> don't unswitch. But this won't be foolproof either, as the condition can be rewritten in a way that we can't infer it anymore.

mkazantsev added inline comments.Jan 9 2023, 9:12 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983

If the branch was very rare before unswitching, it is still very rare after it. I don't see a reason to drop it. Any specific example?

As a middle-ground I can split it off into a different patch, but it still should be there, otherwise performance may suffer.

3106–3107

All branches that dominate the latch also dominate each other (if traversed up-down), because they all are in the same path in dom tree (specifically from latch to header).

3116–3118

The metadata will reliably protect us from single unswitching run go wild and create infinite number of loops. If the condition was known false between two different unswitchings, why it wasn't optimized away?

mkazantsev added inline comments.Jan 9 2023, 9:26 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3116–3118

I think we can use SCEV for this, but it's potentially CT-costly. How much are you concerned?

mkazantsev added inline comments.Jan 9 2023, 9:37 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3106–3107

BTW, do you need to worry about implicit control flow/must execute property here?

No, all that matters is that we work with branches that all dominate the latch (and therfore must execute if backedge is taken). Implicit control flow such as experimental.guard is not supported (and hopefully we can get rid of it).

apilipenko added inline comments.Jan 10 2023, 9:10 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3106–3107

Can you add a comment explicitly stating that you need one condition to dominate another but you are looking for a stronger property - all conditions dominate the latch.

3116–3118

If this is mainly to prevent infinite loop within one unswitch invocation, please add a comment explicitly stating this.

mkazantsev marked 2 inline comments as done.

Addressed comments:

  • No preservation of implicit null check & range metadata
  • More verbose comments

It occurred to me that this is an optimization only if the injected condition is likely. Then we would likely take the version with the loop with the removed loop variant condition. Otherwise, we spend code size and add an extra check before executing the loop with both loop variant conditions. I suggest using the profiling info on the branch to be eliminated to decide whether this is profitable.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2888–2923

Can be separated to a follow up patch.

3044
3108

canonicalizePredicate is probably a better name, as you match the predicate just below.

This comment was removed by mkazantsev.
mkazantsev updated this revision to Diff 492709.EditedJan 27 2023, 4:59 AM

Split off canonicalization of predicates. Patch will be put on review after this is landed.

Added hot edge check as Artur has proposed.

Added test file (lost on last update)

mkazantsev planned changes to this revision.Jan 30 2023, 4:45 AM
mkazantsev updated this revision to Diff 493280.EditedJan 30 2023, 5:11 AM

Rolled back to last working version (effectively undone changes from last request).

  • Removal of canonicalization cripples the transform to a point where it makes no sense. Function renamed into canonicalizePredicate. Besides, it is functionally incorrect, because the code below relies on canonical form (e.g. loop-invariant RHS).
  • Hot edge check cripples the transform to doing nothing. Trying to understand why.
mkazantsev added inline comments.Jan 30 2023, 5:12 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3044

What's the difference besides being longer?

mkazantsev added inline comments.Jan 30 2023, 5:20 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2888–2923

It makes the scope so narrow that what remains makes no sense.

Seems that BFI is not available by default, and require<block-freq> doesn't force it. We can safely assume that in real circumstances we won't have it. Let's not rely on it.

mkazantsev added inline comments.Jan 30 2023, 8:48 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2888–2923

No, because this is functionally incorrect. The code below relies on fact that predicate is canonicalized, e.g. RHS is invariant.

Besides, even if it was correct (i.e. checks are duplicated outside this method), it makes the scope of this transform very narrow, so we just won't benefit from it.

mkazantsev planned changes to this revision.Feb 1 2023, 10:59 PM
mkazantsev updated this revision to Diff 494270.Feb 2 2023, 5:35 AM
mkazantsev edited the summary of this revision. (Show Details)

Addressed comments:

  • Canonicalization split off;
  • Checks are factored out into a separate function;
  • Also separate checker for metadata, it also checks frequencies. I was unable to force BFI be non-null there, so the solution is somewhat clumsy.
mkazantsev updated this revision to Diff 494272.Feb 2 2023, 5:37 AM
mkazantsev updated this revision to Diff 494276.Feb 2 2023, 5:48 AM

Fixed metadata in test_03

skatkov added a subscriber: skatkov.Feb 7 2023, 1:41 AM

Mostly looks good, Some comments with nits, please consider.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2891

shouldTryInjectInvariantCondition?

2912

May we just document the real behavior?
That we do not want to unswitch on branches with probability less than some very likely threshold?

2913

shouldTryInjectBasingOnMetadata

2922

why not just use
if (!extractBranchWeights(const Instruction &I, SmallVectorImpl<uint32_t> &Weights)

return false;
2925

An Option instead of hardcoded probability?

3020

May be do this verification only for expensive debug?

3047

What if they exist? do you expect that it is already simplified?

3335

collectUnswitchCandidates has only this use. you probably want to make it returning void now. you can land it as a separate NFC patch.

mkazantsev added inline comments.Feb 7 2023, 1:49 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3047

"might not exist".

Metadata prevents us from doing this twice, but someone else could insert them by other means. This will be handled by GVN.

skatkov added inline comments.Feb 7 2023, 1:53 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3047

It might be present in the original code since the beginning. but ok.

skatkov added inline comments.Feb 7 2023, 1:55 AM
llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll
3

can you please explicitly add -simple-loop-unswitch-inject-invariant-conditions=true here.
May be even makes sense to land it with off by default and switch it on as a separate commit.

mkazantsev added inline comments.Feb 7 2023, 2:32 AM
llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll
3

I don't see any value in merging something that doesn't work. It won't be tested.

skatkov added inline comments.Feb 7 2023, 2:36 AM
llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll
3

I did not say "Don't switch it off" I said switch it on a separate commit to avoid possible big reverts.

3

I meant "Don't switch it on"

mkazantsev updated this revision to Diff 495451.Feb 7 2023, 3:25 AM
mkazantsev marked 9 inline comments as done.

Addressed comments.

mkazantsev marked an inline comment as done.Feb 7 2023, 3:26 AM
mkazantsev added inline comments.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3335

Can be done as follow-up.

mkazantsev updated this revision to Diff 495458.Feb 7 2023, 3:42 AM
mkazantsev marked an inline comment as done.

Added options to the test

mkazantsev planned changes to this revision.Feb 7 2023, 3:54 AM

Internal testing revealed a failure with

Assertion `!isa<Constant>(Invariant) && "Should not be replacing constant values!"' failed.

Need to investigate what happened.

mkazantsev updated this revision to Diff 495478.Feb 7 2023, 4:59 AM

Fixed 2 minor bugs:

  • Degenerate profile {0, 0} is now rejected;
  • Do not use Builder.CreateICmp for injected condition creation, as it may generate a constant and break further logic. Leave this simplification for the further passes.

Looks good to me.
Please land it off by default and then switch it on as a separate commit.

Please wait for 1-2 days for others to react and then land it.

skatkov accepted this revision.Feb 8 2023, 2:22 AM
This revision is now accepted and ready to land.Feb 8 2023, 2:22 AM
This revision was landed with ongoing or failed builds.Feb 9 2023, 9:49 PM
This revision was automatically updated to reflect the committed changes.