This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
35/40
SimpleLoopUnswitch.cpp
-
test/Transforms/SimpleLoopUnswitch/
-
Transforms/
-
SimpleLoopUnswitch/
1/4
inject-invariant-conditions.ll

Differential D136233

[SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when it's profitable
ClosedPublic

Authored by mkazantsev on Oct 19 2022, 1:19 AM.

Download Raw Diff

Details

Reviewers

apilipenko
fhahn
nikic
hyeongyukim
aeubanks
jaykang10
skatkov

Commits

rG5d10753314ed: [SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when…

Summary

Based on https://discourse.llvm.org/t/rfc-inject-invariant-conditions-to-loops-to-enable-unswitching-and-constraint-elimination

This transform attempts to handle the following loop:

for (...) {
  x = <some variant>
  if (x <u C1) {} else break;
  if (x <u C2) {} else break;
}

Here x is some loop-variant value, and C1 and C2 are loop invariants.
As we see, this loop has no invariant checks we can unswitch on. However, there is an
invariant condition that can make the second check redundant. Specifically, it is C1 <=u C2.
We can modify this code in the following way:

for (...) {
  x = <some variant>
  if (x <u C1) {} else break;
  if (C1 <=u C2) {
  /* no check is required */
  }
  else {
    // do the check normally
    if (x <u C2) {} else break;
  }
}

Now we have an invariant condition C1 <=u C2 and can unswitch on it.

This patch introduces the basic version of this transform, with some limitations,
all of them seem liftable (but needs more work & testing):

All checks are ult condition;
All branches in question stay in loop if the said condition is true and leave it otherwise;
All in-loop branches are hot enough;

There is also a room for improvement cost model. So far we evalutate the cost of
unswitching this newly injected invariant branch the same as if we would unswitch
on 2nd condition, which is not exactly precise (but also not grossly wrong).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mkazantsev created this revision.Oct 19 2022, 1:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 1:19 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

mkazantsev requested review of this revision.Oct 19 2022, 1:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 1:19 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B192932: Diff 468811.Oct 19 2022, 2:00 AM

Alive2 proof of correctness (on test_04): https://alive2.llvm.org/ce/z/S9sX5h

Harbormaster completed remote builds in B195194: Diff 471903.Oct 31 2022, 12:31 AM

Put MSSA verification under flag.

Harbormaster completed remote builds in B195215: Diff 471934.Oct 31 2022, 3:58 AM

Ping

A couple of high-level comments.

How does this type of unswitching interact with poison?

I'm not fond of the term "virtual" here as it is very non-descriptive. I don't have a great suggestion, but maybe "unswitch by injected invariants" or something like this?

Looks like you are doing some canonicalization of conditions in this patch: replacing signed with unsigned, handling zexts. I suggest splitting all of this into separate patches so as to make the initial review smaller. Leave the bare minimum transform in the initial patch and then layer complexity with follow up reviews.

How does this type of unswitching interact with poison?

We used to have something like:

  %cmp1 = icmp ult i32 %x, %A
  br i1 cmp1, label %taken, label %exit

taken:
  %cmp2 = icmp ult i32 %x, %B
  br i1 cmp2, label %taken2, label %exit2

We want to insert a new check A <=u B right before br i1 cmp2. Note that, in the original program, when we execute this branch, and either A or B was poison, then the original program has undefined behavior. So inserting a new comparison and branch by A <=u B doesn't change this fact: it's still UB if either of them is a poison. So the answer to your question is "interaction with poison remains unchanged".

I'm not fond of the term "virtual" here as it is very non-descriptive. I don't have a great suggestion, but maybe "unswitch by injected invariants" or something like this?

Agreed, but let's have more opinions on what name is more suitable here.

Looks like you are doing some canonicalization of conditions in this patch: replacing signed with unsigned, handling zexts. I suggest splitting all of this into separate patches so as to make the initial review smaller.

Makes sense, will do.

Updates according to Artur's comments:

Got rid of "virtual" terminology
Split off handling of different types

mkazantsev added a child revision: D138015: [SimpleLoopUnswtich] Support zext when injecting invariant conditions.Nov 15 2022, 2:09 AM

Harbormaster completed remote builds in B197705: Diff 475381.Nov 15 2022, 2:44 AM

Ping

apilipenko added inline comments.Dec 7 2022, 6:04 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983	The injected condition doesn't necessarily have the same profile as the original. I also don't think that we should preserve MD_make_implicit.
3058	Stray change.
3073–3086	Since you are not handling zexts in this patch, this part of the comment can be dropped.
3106–3107	Why do you limit the transform to conditions that dominate the latch?
3116–3118	Why do you need to remember this fact and disable unswitching later?
3136–3137	Can this be done from the loop above?
3340–3342

mkazantsev added inline comments.Dec 8 2022, 9:30 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983	The injected condition doesn't necessarily have the same profile as the original. If the number of iterations in the loop is the same, then it does. Because this condition is invariant, the frequency ratio shows how often it is true or false. If the number of iterations in this loop is different from call to call - well, formally, it might not be the same after unswitching. However, it also means that the profile we report here is something collected from cumulative multiple different runs, and is misleading by itself. It could be 0:1 in one run, 1000:0 in another run, and 1000:1 on average. I still think that in this situation we can preserve it, just to denote which loop is more frequent, even if exact numbers won't hold. Having imprecise, but conceptually "this is more frequent than that" numbers is better than having no numbers. Does that make sense? I also don't think that we should preserve MD_make_implicit. Why not? I think we can always have this metadata whenever we think that one of branches is super rare. And this unswitching won't change that fact.
3073–3086	Will move to follow-up patch.

mkazantsev added inline comments.Dec 8 2022, 9:30 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3106–3107	Because I prove implications, and the implication can only be proven from something that must execute before the implied condition.
3116–3118	If the invariant condition is false, the initial loop-variant condition cannot be proven true or false. Example: for (int i = 0; i < N; i++) { x = arr[i]; if (x <0 \|\| x >= 128) deopt(); if (x < 0 \|\| x >= len) deopt(); ... } The transform will insert loop-invariant condition `128 <= len` to get rid of loop-variant check `if (x < 0 \|\| x >= len)` in one of unswitched copies. If we have proven that `128 <= len`, then we have proven that `0 <= x < 128 <= len`, and therefore in the unswitched version 2nd check can be removed. But if we haven't proven that, we do not automatically know anything about 2nd condition. Example: len = 100 but x is still less than 99. It means that in the unswitched copy, we should preserve the initial loop-variant condition. And we need to prevent from the optimization go crazy and infinitely unswitch on it.
3136–3137	Yes, but this loop is big already... I wanted to split up the logic to keep code more or less comprehendable.
3340–3342	Yup, thanks for pointing out.

Rebase & addressed comments

mkazantsev marked 2 inline comments as done.Dec 19 2022, 3:32 AM

Harbormaster completed remote builds in B203863: Diff 483895.Dec 19 2022, 4:32 AM

Ping

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 8 2023, 10:39 PM

apilipenko added inline comments.Jan 9 2023, 9:05 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983	I still think that in this situation we can preserve it, just to denote which loop is more frequent, even if exact numbers won't hold. Having imprecise, but conceptually "this is more frequent than that" numbers is better than having no numbers. Does that make sense? I would prefer to drop the profile in this case. I suggest removing the preservation of branch weights metadata for now. Not having branch weights is always valid. We can return to this discussion later, once we have the main logic of the optimization checked in. Why not? I think we can always have this metadata whenever we think that one of branches is super rare. And this unswitching won't change that fact. The branch needs to be very rare, and there should be a way to heal from this optimization. https://llvm.org/docs/FaultMaps.html#make-implicit-metadata This is way implicit null check optimization is not driven by profile only.
3106–3107	But what you need in fact is dominance between the two branches, not between the branches and the latch. BTW, do you need to worry about implicit control flow/must execute property here?
3116–3118	The fact that you are relying on metadata for this is a bit concerning. There might be a pass that doesn't know anything about this metadata and drops it, leading to an undesired unswitching. You can probably check for the invariant condition you are about to insert in the dominating conditions. If there such a condition is known to be false -> don't unswitch. But this won't be foolproof either, as the condition can be rewritten in a way that we can't infer it anymore.

mkazantsev added inline comments.Jan 9 2023, 9:12 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2980–2983	If the branch was very rare before unswitching, it is still very rare after it. I don't see a reason to drop it. Any specific example? As a middle-ground I can split it off into a different patch, but it still should be there, otherwise performance may suffer.
3106–3107	All branches that dominate the latch also dominate each other (if traversed up-down), because they all are in the same path in dom tree (specifically from latch to header).
3116–3118	The metadata will reliably protect us from single unswitching run go wild and create infinite number of loops. If the condition was known false between two different unswitchings, why it wasn't optimized away?

mkazantsev added inline comments.Jan 9 2023, 9:26 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3116–3118	I think we can use SCEV for this, but it's potentially CT-costly. How much are you concerned?

mkazantsev added inline comments.Jan 9 2023, 9:37 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3106–3107	BTW, do you need to worry about implicit control flow/must execute property here? No, all that matters is that we work with branches that all dominate the latch (and therfore must execute if backedge is taken). Implicit control flow such as experimental.guard is not supported (and hopefully we can get rid of it).

apilipenko added inline comments.Jan 10 2023, 9:10 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3106–3107	Can you add a comment explicitly stating that you need one condition to dominate another but you are looking for a stronger property - all conditions dominate the latch.
3116–3118	If this is mainly to prevent infinite loop within one unswitch invocation, please add a comment explicitly stating this.

Addressed comments:

No preservation of implicit null check & range metadata
More verbose comments

Harbormaster completed remote builds in B208019: Diff 489507.Jan 16 2023, 5:54 AM

It occurred to me that this is an optimization only if the injected condition is likely. Then we would likely take the version with the loop with the removed loop variant condition. Otherwise, we spend code size and add an extra check before executing the loop with both loop variant conditions. I suggest using the profiling info on the branch to be eliminated to decide whether this is profitable.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2888–2923	Can be separated to a follow up patch.
3044
3108	canonicalizePredicate is probably a better name, as you match the predicate just below.

mkazantsev added a comment.Jan 27 2023, 4:23 AM

This comment was removed by mkazantsev.

Split off canonicalization of predicates. Patch will be put on review after this is landed.

Added hot edge check as Artur has proposed.

Harbormaster completed remote builds in B210323: Diff 492709.Jan 27 2023, 5:48 AM

Added test file (lost on last update)

Harbormaster completed remote builds in B210649: Diff 493139.Jan 29 2023, 3:16 PM

mkazantsev planned changes to this revision.Jan 30 2023, 4:45 AM

Rolled back to last working version (effectively undone changes from last request).

Removal of canonicalization cripples the transform to a point where it makes no sense. Function renamed into canonicalizePredicate. Besides, it is functionally incorrect, because the code below relies on canonical form (e.g. loop-invariant RHS).
Hot edge check cripples the transform to doing nothing. Trying to understand why.

mkazantsev added inline comments.Jan 30 2023, 5:12 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3044	What's the difference besides being longer?

mkazantsev added inline comments.Jan 30 2023, 5:20 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2888–2923	It makes the scope so narrow that what remains makes no sense.

Seems that BFI is not available by default, and require<block-freq> doesn't force it. We can safely assume that in real circumstances we won't have it. Let's not rely on it.

Harbormaster completed remote builds in B210753: Diff 493280.Jan 30 2023, 6:52 AM

mkazantsev added inline comments.Jan 30 2023, 8:48 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2888–2923	No, because this is functionally incorrect. The code below relies on fact that predicate is canonicalized, e.g. RHS is invariant. Besides, even if it was correct (i.e. checks are duplicated outside this method), it makes the scope of this transform very narrow, so we just won't benefit from it.

mkazantsev planned changes to this revision.Feb 1 2023, 10:59 PM

Addressed comments:

Canonicalization split off;
Checks are factored out into a separate function;
Also separate checker for metadata, it also checks frequencies. I was unable to force BFI be non-null there, so the solution is somewhat clumsy.

mkazantsev updated this revision to Diff 494272.Feb 2 2023, 5:37 AM

Fixed metadata in test_03

mkazantsev added a child revision: D143175: [SimpleLoopUnswitch] Canonicalize conditions for injection of invariant condition.Feb 2 2023, 5:53 AM

mkazantsev removed a child revision: D138015: [SimpleLoopUnswtich] Support zext when injecting invariant conditions.

Harbormaster completed remote builds in B211468: Diff 494276.Feb 2 2023, 6:38 AM

Mostly looks good, Some comments with nits, please consider.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2891	shouldTryInjectInvariantCondition?
2912	May we just document the real behavior? That we do not want to unswitch on branches with probability less than some very likely threshold?
2913	shouldTryInjectBasingOnMetadata
2922	why not just use if (!extractBranchWeights(const Instruction &I, SmallVectorImpl<uint32_t> &Weights) return false;
2925	An Option instead of hardcoded probability?
3020	May be do this verification only for expensive debug?
3047	What if they exist? do you expect that it is already simplified?
3335	collectUnswitchCandidates has only this use. you probably want to make it returning void now. you can land it as a separate NFC patch.

mkazantsev added inline comments.Feb 7 2023, 1:49 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3047	"might not exist". Metadata prevents us from doing this twice, but someone else could insert them by other means. This will be handled by GVN.

skatkov added inline comments.Feb 7 2023, 1:53 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3047	It might be present in the original code since the beginning. but ok.

skatkov added inline comments.Feb 7 2023, 1:55 AM

llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll
3	can you please explicitly add -simple-loop-unswitch-inject-invariant-conditions=true here. May be even makes sense to land it with off by default and switch it on as a separate commit.

mkazantsev added inline comments.Feb 7 2023, 2:32 AM

llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll
3	I don't see any value in merging something that doesn't work. It won't be tested.

skatkov added inline comments.Feb 7 2023, 2:36 AM

llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll
3	I did not say "Don't switch it off" I said switch it on a separate commit to avoid possible big reverts.
3	I meant "Don't switch it on"

Addressed comments.

mkazantsev marked an inline comment as done.Feb 7 2023, 3:26 AM

mkazantsev added inline comments.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3335	Can be done as follow-up.

Added options to the test

Internal testing revealed a failure with

Assertion `!isa<Constant>(Invariant) && "Should not be replacing constant values!"' failed.

Need to investigate what happened.

Harbormaster completed remote builds in B212333: Diff 495458.Feb 7 2023, 4:59 AM

Fixed 2 minor bugs:

Degenerate profile {0, 0} is now rejected;
Do not use Builder.CreateICmp for injected condition creation, as it may generate a constant and break further logic. Leave this simplification for the further passes.

Harbormaster completed remote builds in B212350: Diff 495478.Feb 7 2023, 6:26 AM

Looks good to me.
Please land it off by default and then switch it on as a separate commit.

Please wait for 1-2 days for others to react and then land it.

skatkov accepted this revision.Feb 8 2023, 2:22 AM

This revision is now accepted and ready to land.Feb 8 2023, 2:22 AM

This revision was landed with ongoing or failed builds.Feb 9 2023, 9:49 PM

Closed by commit rG5d10753314ed: [SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when… (authored by mkazantsev). · Explain Why

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rG5d10753314ed: [SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

SimpleLoopUnswitch.cpp

313 lines

test/

Transforms/

SimpleLoopUnswitch/

inject-invariant-conditions.ll

481 lines

Diff 496329

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

Show All 36 Lines

#include "llvm/IR/Dominators.h" #include "llvm/IR/Dominators.h"

#include "llvm/IR/Function.h" #include "llvm/IR/Function.h"

#include "llvm/IR/IRBuilder.h" #include "llvm/IR/IRBuilder.h"

#include "llvm/IR/InstrTypes.h" #include "llvm/IR/InstrTypes.h"

#include "llvm/IR/Instruction.h" #include "llvm/IR/Instruction.h"

#include "llvm/IR/Instructions.h" #include "llvm/IR/Instructions.h"

#include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/IntrinsicInst.h"

#include "llvm/IR/PatternMatch.h" #include "llvm/IR/PatternMatch.h"

#include "llvm/IR/ProfDataUtils.h"

#include "llvm/IR/Use.h" #include "llvm/IR/Use.h"

#include "llvm/IR/Value.h" #include "llvm/IR/Value.h"

#include "llvm/InitializePasses.h" #include "llvm/InitializePasses.h"

#include "llvm/Pass.h" #include "llvm/Pass.h"

#include "llvm/Support/Casting.h" #include "llvm/Support/Casting.h"

#include "llvm/Support/CommandLine.h" #include "llvm/Support/CommandLine.h"

#include "llvm/Support/Debug.h" #include "llvm/Support/Debug.h"

#include "llvm/Support/ErrorHandling.h" #include "llvm/Support/ErrorHandling.h"

Show All 20 Lines

STATISTIC(NumBranches, "Number of branches unswitched"); STATISTIC(NumBranches, "Number of branches unswitched");

STATISTIC(NumSwitches, "Number of switches unswitched"); STATISTIC(NumSwitches, "Number of switches unswitched");

STATISTIC(NumGuards, "Number of guards turned into branches for unswitching"); STATISTIC(NumGuards, "Number of guards turned into branches for unswitching");

STATISTIC(NumTrivial, "Number of unswitches that are trivial"); STATISTIC(NumTrivial, "Number of unswitches that are trivial");

STATISTIC( STATISTIC(

NumCostMultiplierSkipped, NumCostMultiplierSkipped,

"Number of unswitch candidates that had their cost multiplier skipped"); "Number of unswitch candidates that had their cost multiplier skipped");

STATISTIC(NumInvariantConditionsInjected,

"Number of invariant conditions injected and unswitched");

static cl::opt<bool> EnableNonTrivialUnswitch( static cl::opt<bool> EnableNonTrivialUnswitch(

"enable-nontrivial-unswitch", cl::init(false), cl::Hidden, "enable-nontrivial-unswitch", cl::init(false), cl::Hidden,

cl::desc("Forcibly enables non-trivial loop unswitching rather than " cl::desc("Forcibly enables non-trivial loop unswitching rather than "

"following the configuration passed into the pass.")); "following the configuration passed into the pass."));

static cl::opt<int> static cl::opt<int>

UnswitchThreshold("unswitch-threshold", cl::init(50), cl::Hidden, UnswitchThreshold("unswitch-threshold", cl::init(50), cl::Hidden,

Show All 24 Lines MSSAThreshold("simple-loop-unswitch-memoryssa-threshold",

cl::desc("Max number of memory uses to explore during " cl::desc("Max number of memory uses to explore during "

"partial unswitching analysis"), "partial unswitching analysis"),

cl::init(100), cl::Hidden); cl::init(100), cl::Hidden);

static cl::opt<bool> FreezeLoopUnswitchCond( static cl::opt<bool> FreezeLoopUnswitchCond(

"freeze-loop-unswitch-cond", cl::init(true), cl::Hidden, "freeze-loop-unswitch-cond", cl::init(true), cl::Hidden,

cl::desc("If enabled, the freeze instruction will be added to condition " cl::desc("If enabled, the freeze instruction will be added to condition "

"of loop unswitch to prevent miscompilation.")); "of loop unswitch to prevent miscompilation."));

static cl::opt<bool> InjectInvariantConditions(

"simple-loop-unswitch-inject-invariant-conditions", cl::Hidden,

cl::desc("Whether we should inject new invariants and unswitch them to "

"eliminate some existing (non-invariant) conditions."),

cl::init(true));

static cl::opt<unsigned> InjectInvariantConditionHotnesThreshold(

"simple-loop-unswitch-inject-invariant-condition-hotness-threshold",

cl::Hidden, cl::desc("Only try to inject loop invariant conditions and "

"unswitch on them to eliminate branches that are "

"not-taken 1/<this option> times or less."),

cl::init(16));

namespace { namespace {

struct CompareDesc {

BranchInst *Term;

Value *Invariant;

BasicBlock *InLoopSucc;

CompareDesc(BranchInst *Term, Value *Invariant, BasicBlock *InLoopSucc)

: Term(Term), Invariant(Invariant), InLoopSucc(InLoopSucc) {}

};

struct InjectedInvariant {

ICmpInst::Predicate Pred;

Value *LHS;

Value *RHS;

BasicBlock *InLoopSucc;

InjectedInvariant(ICmpInst::Predicate Pred, Value *LHS, Value *RHS,

BasicBlock *InLoopSucc)

: Pred(Pred), LHS(LHS), RHS(RHS), InLoopSucc(InLoopSucc) {}

};

struct NonTrivialUnswitchCandidate { struct NonTrivialUnswitchCandidate {

Instruction *TI = nullptr; Instruction *TI = nullptr;

TinyPtrVector<Value *> Invariants; TinyPtrVector<Value *> Invariants;

std::optional<InstructionCost> Cost; std::optional<InstructionCost> Cost;

std::optional<InjectedInvariant> PendingInjection;

NonTrivialUnswitchCandidate( NonTrivialUnswitchCandidate(

Instruction *TI, ArrayRef<Value *> Invariants, Instruction *TI, ArrayRef<Value *> Invariants,

std::optional<InstructionCost> Cost = std::nullopt) std::optional<InstructionCost> Cost = std::nullopt,

: TI(TI), Invariants(Invariants), Cost(Cost){}; std::optional<InjectedInvariant> PendingInjection = std::nullopt)

: TI(TI), Invariants(Invariants), Cost(Cost),

PendingInjection(PendingInjection) {};

bool hasPendingInjection() const { return PendingInjection.has_value(); }

}; };

} // end anonymous namespace. } // end anonymous namespace.

// Helper to skip (select x, true, false), which matches both a logical AND and // Helper to skip (select x, true, false), which matches both a logical AND and

// OR and can confuse code that tries to determine if \p Cond is either a // OR and can confuse code that tries to determine if \p Cond is either a

// logical AND or OR but not both. // logical AND or OR but not both.

static Value *skipTrivialSelect(Value *Cond) { static Value *skipTrivialSelect(Value *Cond) {

Value *CondNext; Value *CondNext;

▲ Show 20 Lines • Show All 2,701 Lines • ▼ Show 20 Lines if (auto Info = hasPartialIVCondition(L, MSSAThreshold, *MSSA, AA)) {

llvm::append_range(ValsToDuplicate, Info->InstToDuplicate); llvm::append_range(ValsToDuplicate, Info->InstToDuplicate);

UnswitchCandidates.push_back( UnswitchCandidates.push_back(

{L.getHeader()->getTerminator(), std::move(ValsToDuplicate)}); {L.getHeader()->getTerminator(), std::move(ValsToDuplicate)});

} }

return !UnswitchCandidates.empty(); return !UnswitchCandidates.empty();

} }

/// Returns true, if predicate described by ( \p Pred, \p LHS, \p RHS )

/// succeeding into blocks ( \p IfTrue, \p IfFalse) can be optimized by

/// injecting a loop-invariant condition.

static bool shouldTryInjectInvariantCondition(

skatkovUnsubmitted

Done

shouldTryInjectInvariantCondition?

skatkov: shouldTryInjectInvariantCondition?

const ICmpInst::Predicate Pred, const Value *LHS, const Value *RHS,

const BasicBlock *IfTrue, const BasicBlock *IfFalse, const Loop &L) {

if (L.isLoopInvariant(LHS) || !L.isLoopInvariant(RHS))

return false;

// TODO: Support other predicates.

if (Pred != ICmpInst::ICMP_ULT)

return false;

// TODO: Support non-loop-exiting branches?

if (!L.contains(IfTrue) || L.contains(IfFalse))

return false;

// FIXME: For some reason this causes problems with MSSA updates, need to

// investigate why. So far, just don't unswitch latch.

if (L.getHeader() == IfTrue)

return false;

return true;

}

/// Returns true, if metadata on \p BI allows us to optimize branching into \p

/// TakenSucc via injection of invariant conditions. The branch should be not

/// enough and not previously unswitched, the information about this comes from

/// the metadata.

skatkovUnsubmitted

Done

May we just document the real behavior?
That we do not want to unswitch on branches with probability less than some very likely threshold?

skatkov: May we just document the real behavior? That we do not want to unswitch on branches with…

bool shouldTryInjectBasingOnMetadata(const BranchInst *BI,

skatkovUnsubmitted

Done

shouldTryInjectBasingOnMetadata

skatkov: shouldTryInjectBasingOnMetadata

const BasicBlock *TakenSucc) {

// Skip branches that have already been unswithed this way. After successful

// unswitching of injected condition, we will still have a copy of this loop

// which looks exactly the same as original one. To prevent the 2nd attempt

// of unswitching it in the same pass, mark this branch as "nothing to do

// here".

if (BI->hasMetadata("llvm.invariant.condition.injection.disabled"))

return false;

SmallVector<uint32_t> Weights;

skatkovUnsubmitted

Done

why not just use
if (!extractBranchWeights(const Instruction &I, SmallVectorImpl<uint32_t> &Weights)

return false;

skatkov: why not just use if (!extractBranchWeights(const Instruction &I, SmallVectorImpl<uint32_t>…

if (!extractBranchWeights(*BI, Weights))

apilipenkoUnsubmitted

Not Done

Can be separated to a follow up patch.

apilipenko: Can be separated to a follow up patch.

mkazantsevAuthorUnsubmitted

Done

It makes the scope so narrow that what remains makes no sense.

mkazantsev: It makes the scope so narrow that what remains makes no sense.

mkazantsevAuthorUnsubmitted

Done

No, because this is functionally incorrect. The code below relies on fact that predicate is canonicalized, e.g. RHS is invariant.

Besides, even if it was correct (i.e. checks are duplicated outside this method), it makes the scope of this transform very narrow, so we just won't benefit from it.

mkazantsev: No, because this is functionally incorrect. The code below relies on fact that predicate is…

return false;

unsigned T = InjectInvariantConditionHotnesThreshold;

skatkovUnsubmitted

Done

An Option instead of hardcoded probability?

skatkov: An Option instead of hardcoded probability?

BranchProbability LikelyTaken(T - 1, T);

assert(Weights.size() == 2 && "Unexpected profile data!");

size_t Idx = BI->getSuccessor(0) == TakenSucc ? 0 : 1;

auto Num = Weights[Idx];

auto Denom = Weights[0] + Weights[1];

// Degenerate metadata.

if (Denom == 0)

return false;

BranchProbability ActualTaken(Num, Denom);

if (LikelyTaken > ActualTaken)

return false;

return true;

}

/// Materialize pending invariant condition of the given candidate into IR. The

/// injected loop-invariant condition implies the original loop-variant branch

/// condition, so the materialization turns

///

/// loop_block:

/// ...

/// br i1 %variant_cond, label InLoopSucc, label OutOfLoopSucc

///

/// into

///

/// preheader:

/// %invariant_cond = LHS pred RHS

/// ...

/// loop_block:

/// br i1 %invariant_cond, label InLoopSucc, label OriginalCheck

/// OriginalCheck:

/// br i1 %variant_cond, label InLoopSucc, label OutOfLoopSucc

/// ...

static NonTrivialUnswitchCandidate

injectPendingInvariantConditions(NonTrivialUnswitchCandidate Candidate, Loop &L,

DominatorTree &DT, LoopInfo &LI,

AssumptionCache &AC, MemorySSAUpdater *MSSAU) {

assert(Candidate.hasPendingInjection() && "Nothing to inject!");

BasicBlock *Preheader = L.getLoopPreheader();

assert(Preheader && "Loop is not in simplified form?");

auto Pred = Candidate.PendingInjection->Pred;

auto *LHS = Candidate.PendingInjection->LHS;

auto *RHS = Candidate.PendingInjection->RHS;

auto *InLoopSucc = Candidate.PendingInjection->InLoopSucc;

auto *TI = cast<BranchInst>(Candidate.TI);

auto *BB = Candidate.TI->getParent();

assert(InLoopSucc == TI->getSuccessor(0));

auto *OutOfLoopSucc = TI->getSuccessor(1);

// FIXME: Remove this once limitation on successors is lifted.

assert(L.contains(InLoopSucc) && "Not supported yet!");

assert(!L.contains(OutOfLoopSucc) && "Not supported yet!");

auto &Ctx = BB->getContext();

assert(LHS->getType() == RHS->getType() && "Type mismatch!");

// Do not use builder here: CreateICmp may simplify this intro a constant and

// unswitching will break. Better optimize it away later.

auto *InjectedCond =

apilipenkoUnsubmitted

Done

The injected condition doesn't necessarily have the same profile as the original.

I also don't think that we should preserve MD_make_implicit.

apilipenko: The injected condition doesn't necessarily have the same profile as the original. I also don't…

mkazantsevAuthorUnsubmitted

Done

The injected condition doesn't necessarily have the same profile as the original.

If the number of iterations in the loop is the same, then it does. Because this condition is invariant, the frequency ratio shows how often it is true or false.

If the number of iterations in this loop is different from call to call - well, formally, it might not be the same after unswitching. However, it also means that the profile we report here is something collected from cumulative multiple different runs, and is misleading by itself. It could be 0:1 in one run, 1000:0 in another run, and 1000:1 on average. I still think that in this situation we can preserve it, just to denote which loop is more frequent, even if exact numbers won't hold. Having imprecise, but conceptually "this is more frequent than that" numbers is better than having no numbers. Does that make sense?

I also don't think that we should preserve MD_make_implicit.

Why not? I think we can always have this metadata whenever we think that one of branches is super rare. And this unswitching won't change that fact.

mkazantsev: > The injected condition doesn't necessarily have the same profile as the original. If the…

apilipenkoUnsubmitted

Not Done

I still think that in this situation we can preserve it, just to denote which loop is more frequent, even if exact numbers won't hold. Having imprecise, but conceptually "this is more frequent than that" numbers is better than having no numbers. Does that make sense?

I would prefer to drop the profile in this case. I suggest removing the preservation of branch weights metadata for now. Not having branch weights is always valid. We can return to this discussion later, once we have the main logic of the optimization checked in.

Why not? I think we can always have this metadata whenever we think that one of branches is super rare. And this unswitching won't change that fact.

The branch needs to be very rare, *and* there should be a way to heal from this optimization.
https://llvm.org/docs/FaultMaps.html#make-implicit-metadata

This is way implicit null check optimization is not driven by profile only.

apilipenko: > I still think that in this situation we can preserve it, just to denote which loop is more…

mkazantsevAuthorUnsubmitted

Done

If the branch was very rare before unswitching, it is still very rare after it. I don't see a reason to drop it. Any specific example?

As a middle-ground I can split it off into a different patch, but it still should be there, otherwise performance may suffer.

mkazantsev: If the branch was very rare before unswitching, it is still very rare after it. I don't see a…

ICmpInst::Create(Instruction::ICmp, Pred, LHS, RHS, "injected.cond",

Preheader->getTerminator());

auto *OldCond = TI->getCondition();

BasicBlock *CheckBlock = BasicBlock::Create(Ctx, BB->getName() + ".check",

BB->getParent(), InLoopSucc);

IRBuilder<> Builder(TI);

auto *InvariantBr =

Builder.CreateCondBr(InjectedCond, InLoopSucc, CheckBlock);

Builder.SetInsertPoint(CheckBlock);

auto *NewTerm = Builder.CreateCondBr(OldCond, InLoopSucc, OutOfLoopSucc);

TI->eraseFromParent();

// Prevent infinite unswitching.

NewTerm->setMetadata("llvm.invariant.condition.injection.disabled",

MDNode::get(BB->getContext(), {}));

// Fixup phis.

for (auto &I : *InLoopSucc) {

auto *PN = dyn_cast<PHINode>(&I);

if (!PN)

break;

auto *Inc = PN->getIncomingValueForBlock(BB);

PN->addIncoming(Inc, CheckBlock);

}

OutOfLoopSucc->replacePhiUsesWith(BB, CheckBlock);

SmallVector<DominatorTree::UpdateType, 4> DTUpdates = {

{ DominatorTree::Insert, BB, CheckBlock },

{ DominatorTree::Insert, CheckBlock, InLoopSucc },

{ DominatorTree::Insert, CheckBlock, OutOfLoopSucc },

{ DominatorTree::Delete, BB, OutOfLoopSucc }

};

DT.applyUpdates(DTUpdates);

if (MSSAU)

skatkovUnsubmitted

Done

May be do this verification only for expensive debug?

skatkov: May be do this verification only for expensive debug?

MSSAU->applyUpdates(DTUpdates, DT);

L.addBasicBlockToLoop(CheckBlock, LI);

#ifdef EXPENSIVE_CHECKS

DT.verify();

LI.verify(DT);

if (MSSAU && VerifyMemorySSA)

MSSAU->getMemorySSA()->verifyMemorySSA();

#endif

// TODO: In fact, cost of unswitching a new invariant candidate is *slightly*

// higher because we have just inserted a new block. Need to think how to

// adjust the cost of injected candidates when it was first computed.

LLVM_DEBUG(dbgs() << "Injected a new loop-invariant branch " << *InvariantBr

<< " and considering it for unswitching.");

++NumInvariantConditionsInjected;

return NonTrivialUnswitchCandidate(InvariantBr, { InjectedCond },

Candidate.Cost);

}

/// Given chain of loop branch conditions looking like:

/// br (Variant < Invariant1)

/// br (Variant < Invariant2)

/// br (Variant < Invariant3)

apilipenkoUnsubmitted

Not Done

ICmpInst::Predicate NonStrictPred = ICmpInst::getNonStrictPredicate(Pred);

- for (auto Prev = Compares.begin(), Next = Compares.begin() + 1;

+ for (auto Prev = Compares.begin(), Next = std::next(Compares.begin(), 1);

Next != Compares.end(); ++Prev, ++Next) {

apilipenko:

mkazantsevAuthorUnsubmitted

Done

What's the difference besides being longer?

mkazantsev: What's the difference besides being longer?

/// ...

/// collect set of invariant conditions on which we want to unswitch, which

/// look like:

skatkovUnsubmitted

Done

What if they exist? do you expect that it is already simplified?

skatkov: What if they exist? do you expect that it is already simplified?

mkazantsevAuthorUnsubmitted

Done

"might not exist".

Metadata prevents us from doing this twice, but someone else could insert them by other means. This will be handled by GVN.

mkazantsev: "might not exist". Metadata prevents us from doing this twice, but someone else could insert…

skatkovUnsubmitted

Done

It might be present in the original code since the beginning. but ok.

skatkov: It might be present in the original code since the beginning. but ok.

/// Invariant1 <= Invariant2

/// Invariant2 <= Invariant3

/// ...

/// Though they might not immediately exist in the IR, we can still inject them.

static bool insertCandidatesWithPendingInjections(

SmallVectorImpl<NonTrivialUnswitchCandidate> &UnswitchCandidates, Loop &L,

ICmpInst::Predicate Pred, ArrayRef<CompareDesc> Compares,

const DominatorTree &DT) {

assert(ICmpInst::isRelational(Pred));

assert(ICmpInst::isStrictPredicate(Pred));

if (Compares.size() < 2)

return false;

ICmpInst::Predicate NonStrictPred = ICmpInst::getNonStrictPredicate(Pred);

for (auto Prev = Compares.begin(), Next = Compares.begin() + 1;

Next != Compares.end(); ++Prev, ++Next) {

Value *LHS = Next->Invariant;

Value *RHS = Prev->Invariant;

BasicBlock *InLoopSucc = Prev->InLoopSucc;

InjectedInvariant ToInject(NonStrictPred, LHS, RHS, InLoopSucc);

NonTrivialUnswitchCandidate Candidate(Prev->Term, { LHS, RHS },

std::nullopt, std::move(ToInject));

UnswitchCandidates.push_back(std::move(Candidate));

}

return true;

}

/// Collect unswitch candidates by invariant conditions that are not immediately

/// present in the loop. However, they can be injected into the code if we

/// decide it's profitable.

/// An example of such conditions is following:

///

/// for (...) {

/// x = load ...

/// if (! x <u C1) break;

/// if (! x <u C2) break;

/// <do something>

/// }

///

apilipenkoUnsubmitted

Done

Since you are not handling zexts in this patch, this part of the comment can be dropped.

apilipenko: Since you are not handling zexts in this patch, this part of the comment can be dropped.

mkazantsevAuthorUnsubmitted

Done

Will move to follow-up patch.

mkazantsev: Will move to follow-up patch.

/// We can unswitch by condition "C1 <=u C2". If that is true, then "x <u C1 <=

/// C2" automatically implies "x <u C2", so we can get rid of one of

/// loop-variant checks in unswitched loop version.

static bool collectUnswitchCandidatesWithInjections(

SmallVectorImpl<NonTrivialUnswitchCandidate> &UnswitchCandidates,

IVConditionInfo &PartialIVInfo, Instruction *&PartialIVCondBranch, Loop &L,

const DominatorTree &DT, const LoopInfo &LI, AAResults &AA,

const MemorySSAUpdater *MSSAU) {

if (!InjectInvariantConditions)

return false;

if (!DT.isReachableFromEntry(L.getHeader()))

return false;

auto *Latch = L.getLoopLatch();

// Need to have a single latch and a preheader.

if (!Latch)

return false;

assert(L.getLoopPreheader() && "Must have a preheader!");

DenseMap<Value *, SmallVector<CompareDesc, 4> > CandidatesULT;

// Traverse the conditions that dominate latch (and therefore dominate each

apilipenkoUnsubmitted

Done

Why do you limit the transform to conditions that dominate the latch?

apilipenko: Why do you limit the transform to conditions that dominate the latch?

mkazantsevAuthorUnsubmitted

Done

Because I prove implications, and the implication can only be proven from something that must execute before the implied condition.

mkazantsev: Because I prove implications, and the implication can only be proven from something that must…

apilipenkoUnsubmitted

Not Done

But what you need in fact is dominance between the two branches, not between the branches and the latch.

BTW, do you need to worry about implicit control flow/must execute property here?

apilipenko: But what you need in fact is dominance between the two branches, not between the branches and…

mkazantsevAuthorUnsubmitted

Done

All branches that dominate the latch also dominate each other (if traversed up-down), because they all are in the same path in dom tree (specifically from latch to header).

mkazantsev: All branches that dominate the latch also dominate each other (if traversed up-down), because…

mkazantsevAuthorUnsubmitted

Done

BTW, do you need to worry about implicit control flow/must execute property here?

No, all that matters is that we work with branches that all dominate the latch (and therfore must execute if backedge is taken). Implicit control flow such as experimental.guard is not supported (and hopefully we can get rid of it).

mkazantsev: > BTW, do you need to worry about implicit control flow/must execute property here? No, all…

apilipenkoUnsubmitted

Done

Can you add a comment explicitly stating that you need one condition to dominate another but you are looking for a stronger property - all conditions dominate the latch.

apilipenko: Can you add a comment explicitly stating that you need one condition to dominate another but…

// other).

apilipenkoUnsubmitted

Done

canonicalizePredicate is probably a better name, as you match the predicate just below.

apilipenko: canonicalizePredicate is probably a better name, as you match the predicate just below.

for (auto *DTN = DT.getNode(Latch); L.contains(DTN->getBlock());

DTN = DTN->getIDom()) {

ICmpInst::Predicate Pred;

Value *LHS = nullptr, *RHS = nullptr;

BasicBlock *IfTrue = nullptr, *IfFalse = nullptr;

auto *BB = DTN->getBlock();

auto *Term = BB->getTerminator();

if (!match(Term, m_Br(m_ICmp(Pred, m_Value(LHS), m_Value(RHS)),

m_BasicBlock(IfTrue), m_BasicBlock(IfFalse))))

continue;

apilipenkoUnsubmitted

Done

Why do you need to remember this fact and disable unswitching later?

apilipenko: Why do you need to remember this fact and disable unswitching later?

mkazantsevAuthorUnsubmitted

Done

If the invariant condition is false, the initial loop-variant condition cannot be proven true or false. Example:

for (int i = 0; i < N; i++) {
  x = arr[i];
  if (x <0 || x >= 128)
    deopt();
  if (x < 0 || x >= len)
    deopt();
  ...
}

The transform will insert loop-invariant condition 128 <= len to get rid of loop-variant check if (x < 0 || x >= len) in one of unswitched copies.

If we have proven that 128 <= len, then we have proven that 0 <= x < 128 <= len, and therefore in the unswitched version 2nd check can be removed. But if we haven't proven that, we do not automatically know anything about 2nd condition. Example: len = 100 but x is still less than 99.

It means that in the unswitched copy, we should preserve the initial loop-variant condition. And we need to prevent from the optimization go crazy and infinitely unswitch on it.

mkazantsev: If the invariant condition is false, the initial loop-variant condition cannot be proven true…

apilipenkoUnsubmitted

Not Done

The fact that you are relying on metadata for this is a bit concerning. There might be a pass that doesn't know anything about this metadata and drops it, leading to an undesired unswitching.

You can probably check for the invariant condition you are about to insert in the dominating conditions. If there such a condition is known to be false -> don't unswitch. But this won't be foolproof either, as the condition can be rewritten in a way that we can't infer it anymore.

apilipenko: The fact that you are relying on metadata for this is a bit concerning. There might be a pass…

mkazantsevAuthorUnsubmitted

Done

The metadata will reliably protect us from single unswitching run go wild and create infinite number of loops. If the condition was known false between two different unswitchings, why it wasn't optimized away?

mkazantsev: The metadata will reliably protect us from single unswitching run go wild and create infinite…

mkazantsevAuthorUnsubmitted

Done

I think we can use SCEV for this, but it's potentially CT-costly. How much are you concerned?

mkazantsev: I think we can use SCEV for this, but it's potentially CT-costly. How much are you concerned?

apilipenkoUnsubmitted

Done

If this is mainly to prevent infinite loop within one unswitch invocation, please add a comment explicitly stating this.

apilipenko: If this is mainly to prevent infinite loop within one unswitch invocation, please add a comment…

if (!shouldTryInjectInvariantCondition(Pred, LHS, RHS, IfTrue, IfFalse, L))

continue;

if (!shouldTryInjectBasingOnMetadata(cast<BranchInst>(Term), IfTrue))

continue;

CompareDesc Desc(cast<BranchInst>(Term), RHS, IfTrue);

CandidatesULT[LHS].push_back(Desc);

}

bool Found = false;

for (auto &It : CandidatesULT)

Found |= insertCandidatesWithPendingInjections(

UnswitchCandidates, L, ICmpInst::ICMP_ULT, It.second, DT);

return Found;

}

static bool isSafeForNoNTrivialUnswitching(Loop &L, LoopInfo &LI) { static bool isSafeForNoNTrivialUnswitching(Loop &L, LoopInfo &LI) {

if (!L.isSafeToClone()) if (!L.isSafeToClone())

return false; return false;

for (auto *BB : L.blocks()) for (auto *BB : L.blocks())

apilipenkoUnsubmitted

Done

Can this be done from the loop above?

apilipenko: Can this be done from the loop above?

mkazantsevAuthorUnsubmitted

Done

Yes, but this loop is big already... I wanted to split up the logic to keep code more or less comprehendable.

mkazantsev: Yes, but this loop is big already... I wanted to split up the logic to keep code more or less…

for (auto &I : *BB) { for (auto &I : *BB) {

if (I.getType()->isTokenTy() && I.isUsedOutsideOfBlock(BB)) if (I.getType()->isTokenTy() && I.isUsedOutsideOfBlock(BB))

return false; return false;

if (auto *CB = dyn_cast<CallBase>(&I)) { if (auto *CB = dyn_cast<CallBase>(&I)) {

assert(!CB->cannotDuplicate() && "Checked by L.isSafeToClone()."); assert(!CB->cannotDuplicate() && "Checked by L.isSafeToClone().");

if (CB->isConvergent()) if (CB->isConvergent())

return false; return false;

} }

▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines auto ComputeUnswitchedCost = [&](Instruction &TI,

return (LoopCost - Cost) * (SuccessorsCount - 1); return (LoopCost - Cost) * (SuccessorsCount - 1);

}; };

std::optional<NonTrivialUnswitchCandidate> Best; std::optional<NonTrivialUnswitchCandidate> Best;

for (auto &Candidate : UnswitchCandidates) { for (auto &Candidate : UnswitchCandidates) {

Instruction &TI = *Candidate.TI; Instruction &TI = *Candidate.TI;

ArrayRef<Value *> Invariants = Candidate.Invariants; ArrayRef<Value *> Invariants = Candidate.Invariants;

BranchInst *BI = dyn_cast<BranchInst>(&TI); BranchInst *BI = dyn_cast<BranchInst>(&TI);

InstructionCost CandidateCost = ComputeUnswitchedCost( bool FullUnswitch =

TI, /*FullUnswitch*/ !BI || !BI || Candidate.hasPendingInjection() ||

(Invariants.size() == 1 && (Invariants.size() == 1 &&

Invariants[0] == skipTrivialSelect(BI->getCondition()))); Invariants[0] == skipTrivialSelect(BI->getCondition()));

InstructionCost CandidateCost = ComputeUnswitchedCost(TI, FullUnswitch);

// Calculate cost multiplier which is a tool to limit potentially // Calculate cost multiplier which is a tool to limit potentially

// exponential behavior of loop-unswitch. // exponential behavior of loop-unswitch.

if (EnableUnswitchCostMultiplier) { if (EnableUnswitchCostMultiplier) {

int CostMultiplier = int CostMultiplier =

CalculateUnswitchCostMultiplier(TI, L, LI, DT, UnswitchCandidates); CalculateUnswitchCostMultiplier(TI, L, LI, DT, UnswitchCandidates);

assert( assert(

(CostMultiplier > 0 && CostMultiplier <= UnswitchThreshold) && (CostMultiplier > 0 && CostMultiplier <= UnswitchThreshold) &&

"cost multiplier needs to be in the range of 1..UnswitchThreshold"); "cost multiplier needs to be in the range of 1..UnswitchThreshold");

Show All 21 Lines static bool unswitchBestCondition(

function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,

ScalarEvolution *SE, MemorySSAUpdater *MSSAU, ScalarEvolution *SE, MemorySSAUpdater *MSSAU,

function_ref<void(Loop &, StringRef)> DestroyLoopCB) { function_ref<void(Loop &, StringRef)> DestroyLoopCB) {

// Collect all invariant conditions within this loop (as opposed to an inner // Collect all invariant conditions within this loop (as opposed to an inner

// loop which would be handled when visiting that inner loop). // loop which would be handled when visiting that inner loop).

SmallVector<NonTrivialUnswitchCandidate, 4> UnswitchCandidates; SmallVector<NonTrivialUnswitchCandidate, 4> UnswitchCandidates;

IVConditionInfo PartialIVInfo; IVConditionInfo PartialIVInfo;

Instruction *PartialIVCondBranch = nullptr; Instruction *PartialIVCondBranch = nullptr;

collectUnswitchCandidates(UnswitchCandidates, PartialIVInfo,

skatkovUnsubmitted

Done

collectUnswitchCandidates has only this use. you probably want to make it returning void now. you can land it as a separate NFC patch.

skatkov: collectUnswitchCandidates has only this use. you probably want to make it returning void now.

mkazantsevAuthorUnsubmitted

Done

Can be done as follow-up.

mkazantsev: Can be done as follow-up.

PartialIVCondBranch, L, LI, AA, MSSAU);

collectUnswitchCandidatesWithInjections(UnswitchCandidates, PartialIVInfo,

PartialIVCondBranch, L, DT, LI, AA,

MSSAU);

// If we didn't find any candidates, we're done. // If we didn't find any candidates, we're done.

if (!collectUnswitchCandidates(UnswitchCandidates, PartialIVInfo, if (UnswitchCandidates.empty())

PartialIVCondBranch, L, LI, AA, MSSAU))

return false; return false;

apilipenkoUnsubmitted

Done

Instruction *PartialIVCondBranch = nullptr;

- // If we didn't find any candidates, we're done.

collectUnswitchCandidates(UnswitchCandidates, PartialIVInfo,

PartialIVCondBranch, L, LI, AA, MSSAU);

collectUnswitchCandidatesWithInjections(UnswitchCandidates, PartialIVInfo,

PartialIVCondBranch, L, DT, LI, AA,

MSSAU);

+ // If we didn't find any candidates, we're done.

if (UnswitchCandidates.empty())

return false;

LLVM_DEBUG(

apilipenko:

mkazantsevAuthorUnsubmitted

Done

Yup, thanks for pointing out.

mkazantsev: Yup, thanks for pointing out.

LLVM_DEBUG( LLVM_DEBUG(

dbgs() << "Considering " << UnswitchCandidates.size() dbgs() << "Considering " << UnswitchCandidates.size()

<< " non-trivial loop invariant conditions for unswitching.\n"); << " non-trivial loop invariant conditions for unswitching.\n");

NonTrivialUnswitchCandidate Best = findBestNonTrivialUnswitchCandidate( NonTrivialUnswitchCandidate Best = findBestNonTrivialUnswitchCandidate(

UnswitchCandidates, L, DT, LI, AC, TTI, PartialIVInfo); UnswitchCandidates, L, DT, LI, AC, TTI, PartialIVInfo);

apilipenkoUnsubmitted

Done

Stray change.

apilipenko: Stray change.

assert(Best.TI && "Failed to find loop unswitch candidate"); assert(Best.TI && "Failed to find loop unswitch candidate");

assert(Best.Cost && "Failed to compute cost"); assert(Best.Cost && "Failed to compute cost");

if (*Best.Cost >= UnswitchThreshold) { if (*Best.Cost >= UnswitchThreshold) {

LLVM_DEBUG(dbgs() << "Cannot unswitch, lowest cost found: " << *Best.Cost LLVM_DEBUG(dbgs() << "Cannot unswitch, lowest cost found: " << *Best.Cost

<< "\n"); << "\n");

return false; return false;

} }

if (Best.hasPendingInjection())

Best = injectPendingInvariantConditions(Best, L, DT, LI, AC, MSSAU);

assert(!Best.hasPendingInjection() &&

"All injections should have been done by now!");

if (Best.TI != PartialIVCondBranch) if (Best.TI != PartialIVCondBranch)

PartialIVInfo.InstToDuplicate.clear(); PartialIVInfo.InstToDuplicate.clear();

// If the best candidate is a guard, turn it into a branch. // If the best candidate is a guard, turn it into a branch.

if (isGuard(Best.TI)) if (isGuard(Best.TI))

Best.TI = Best.TI =

turnGuardIntoBranch(cast<IntrinsicInst>(Best.TI), L, DT, LI, MSSAU); turnGuardIntoBranch(cast<IntrinsicInst>(Best.TI), L, DT, LI, MSSAU);

▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -S -simple-loop-unswitch-inject-invariant-conditions=true -passes="loop(simple-loop-unswitch<nontrivial>),simplifycfg" \| FileCheck %s
				; RUN: opt < %s -S -simple-loop-unswitch-inject-invariant-conditions=true -passes="loop-mssa(simple-loop-unswitch<nontrivial>),simplifycfg" -verify-memoryssa \| FileCheck %s
				skatkovUnsubmitted Not Done Reply Inline Actions can you please explicitly add -simple-loop-unswitch-inject-invariant-conditions=true here. May be even makes sense to land it with off by default and switch it on as a separate commit. skatkov: can you please explicitly add -simple-loop-unswitch-inject-invariant-conditions=true here. May…
				mkazantsevAuthorUnsubmitted Done Reply Inline Actions I don't see any value in merging something that doesn't work. It won't be tested. mkazantsev: I don't see any value in merging something that doesn't work. It won't be tested.
				skatkovUnsubmitted Not Done Reply Inline Actions I did not say "Don't switch it off" I said switch it on a separate commit to avoid possible big reverts. skatkov: I did not say "Don't switch it off" I said switch it on a separate commit to avoid possible big…
				skatkovUnsubmitted Not Done Reply Inline Actions I meant "Don't switch it on" skatkov: I meant "Don't switch it on"

				define i32 @test_01(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_01(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: [[INJECTED_COND:%.]] = icmp ule i32 [[LIMIT:%.]], [[LEN]]
				; CHECK-NEXT: br i1 [[INJECTED_COND]], label [[LOOP_US:%.]], label [[LOOP:%.]]
				; CHECK: loop.us:
				; CHECK-NEXT: [[IV_US:%.]] = phi i32 [ [[IV_NEXT_US:%.]], [[GUARDED_US:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[EL_PTR_US:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV_US]]
				; CHECK-NEXT: [[EL_US:%.*]] = load i32, ptr [[EL_PTR_US]], align 4
				; CHECK-NEXT: [[BOUND_CHECK_US:%.*]] = icmp ult i32 [[EL_US]], [[LIMIT]]
				; CHECK-NEXT: br i1 [[BOUND_CHECK_US]], label [[GUARDED_US]], label [[COMMON_RET:%.*]], !prof [[PROF1:![0-9]+]]
				; CHECK: guarded.us:
				; CHECK-NEXT: [[RANGE_CHECK_US:%.*]] = icmp ult i32 [[EL_US]], [[LEN]]
				; CHECK-NEXT: [[ARR_PTR_US:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL_US]]
				; CHECK-NEXT: store i32 [[IV_US]], ptr [[ARR_PTR_US]], align 4
				; CHECK-NEXT: [[IV_NEXT_US]] = add i32 [[IV_US]], 1
				; CHECK-NEXT: [[LOOP_COND_US:%.]] = icmp slt i32 [[IV_NEXT_US]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND_US]], label [[LOOP_US]], label [[COMMON_RET]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[BACKEDGE:%.*]] ], [ 0, [[ENTRY]] ]
				; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.*]] = icmp ult i32 [[EL]], [[LIMIT]]
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[GUARDED:%.*]], label [[COMMON_RET]], !prof [[PROF1]]
				; CHECK: guarded:
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !llvm.invariant.condition.injection.disabled !0
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.*]] = getelementptr i32, ptr [[ARR]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp slt i32 [[IV_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ 0, [[GUARDED_US]] ], [ -1, [[LOOP]] ], [ -1, [[LOOP_US]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp ult i32 %el, %limit
				br i1 %bound_check, label %guarded, label %bound_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, %len
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				; Should not optimize: profile metadata is missing.
				define i32 @test_01_neg_void_profile(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_01_neg_void_profile(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[EL_PTR:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.]] = icmp ult i32 [[EL]], [[LIMIT:%.]]
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[GUARDED:%.]], label [[COMMON_RET:%.]]
				; CHECK: guarded:
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.]] = icmp slt i32 [[IV_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ -1, [[LOOP]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp ult i32 %el, %limit
				br i1 %bound_check, label %guarded, label %bound_check_failed

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, %len
				br i1 %range_check, label %backedge, label %range_check_failed

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				; Same as test_01, but n and limit are constants.
				define i32 @test_01_constants(ptr noundef %p, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_01_constants(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: [[INJECTED_COND:%.*]] = icmp ule i32 200, 300
				; CHECK-NEXT: br i1 [[INJECTED_COND]], label [[LOOP_US:%.]], label [[LOOP:%.]]
				; CHECK: loop.us:
				; CHECK-NEXT: [[IV_US:%.]] = phi i32 [ [[IV_NEXT_US:%.]], [[GUARDED_US:%.]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: [[EL_PTR_US:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV_US]]
				; CHECK-NEXT: [[EL_US:%.*]] = load i32, ptr [[EL_PTR_US]], align 4
				; CHECK-NEXT: [[BOUND_CHECK_US:%.*]] = icmp ult i32 [[EL_US]], 200
				; CHECK-NEXT: br i1 [[BOUND_CHECK_US]], label [[GUARDED_US]], label [[COMMON_RET:%.*]], !prof [[PROF1]]
				; CHECK: guarded.us:
				; CHECK-NEXT: [[RANGE_CHECK_US:%.*]] = icmp ult i32 [[EL_US]], 300
				; CHECK-NEXT: [[ARR_PTR_US:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL_US]]
				; CHECK-NEXT: store i32 [[IV_US]], ptr [[ARR_PTR_US]], align 4
				; CHECK-NEXT: [[IV_NEXT_US]] = add i32 [[IV_US]], 1
				; CHECK-NEXT: [[LOOP_COND_US:%.*]] = icmp slt i32 [[IV_NEXT_US]], 1000
				; CHECK-NEXT: br i1 [[LOOP_COND_US]], label [[LOOP_US]], label [[COMMON_RET]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[BACKEDGE:%.*]] ], [ 0, [[ENTRY]] ]
				; CHECK-NEXT: [[EL_PTR:%.*]] = getelementptr i32, ptr [[P]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.*]] = icmp ult i32 [[EL]], 200
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !prof [[PROF1]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.*]] = getelementptr i32, ptr [[ARR]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp slt i32 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ 0, [[GUARDED_US]] ], [ -1, [[LOOP]] ], [ -1, [[LOOP_US]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp ult i32 %el, 200
				br i1 %bound_check, label %guarded, label %bound_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, 300
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, 1000
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				define i32 @test_01_neg_degenerate_profile(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_01_neg_degenerate_profile(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[EL_PTR:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.]] = icmp ult i32 [[EL]], [[LIMIT:%.]]
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[GUARDED:%.]], label [[COMMON_RET:%.]], !prof [[PROF1]]
				; CHECK: guarded:
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !prof [[PROF2:![0-9]+]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.]] = icmp slt i32 [[IV_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ -1, [[LOOP]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp ult i32 %el, %limit
				br i1 %bound_check, label %guarded, label %bound_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, %len
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 0, i32 0}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				; Should not optimize: cold branch.
				define i32 @test_01_neg_cold(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_01_neg_cold(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[EL_PTR:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.]] = icmp ult i32 [[EL]], [[LIMIT:%.]]
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[GUARDED:%.]], label [[COMMON_RET:%.]], !prof [[PROF1]]
				; CHECK: guarded:
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !prof [[PROF3:![0-9]+]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.]] = icmp slt i32 [[IV_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ -1, [[LOOP]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp ult i32 %el, %limit
				br i1 %bound_check, label %guarded, label %bound_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, %len
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 2, i32 3}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				define i32 @test_02(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_02(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[EL_PTR:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.*]] = icmp sge i32 [[EL]], 0
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[GUARDED:%.]], label [[COMMON_RET:%.]], !prof [[PROF1]]
				; CHECK: guarded:
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !prof [[PROF1]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.]] = icmp slt i32 [[IV_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ -1, [[LOOP]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp sge i32 %el, 0
				br i1 %bound_check, label %guarded, label %bound_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, %len
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				define i32 @test_03(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_03(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[EL_PTR:%.]] = getelementptr i32, ptr [[P:%.]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i32, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.*]] = icmp slt i32 [[EL]], 0
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[COMMON_RET:%.]], label [[GUARDED:%.]], !prof [[PROF4:![0-9]+]]
				; CHECK: guarded:
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !prof [[PROF1]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.]] = icmp slt i32 [[IV_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ -1, [[LOOP]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i32, ptr %p, i32 %iv
				%el = load i32, ptr %el.ptr, align 4
				%bound_check = icmp slt i32 %el, 0
				br i1 %bound_check, label %bound_check_failed, label %guarded, !prof !{!"branch_weights", i32 1, i32 100}

				guarded: ; preds = %loop
				%range_check = icmp ult i32 %el, %len
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

				define i32 @test_04(ptr noundef %p, i32 noundef %n, i32 noundef %limit, ptr noundef %arr, ptr noundef %len_p) {
				; CHECK-LABEL: @test_04(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN:%.]] = load i32, ptr [[LEN_P:%.]], align 4, !noundef !0
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ]
				; CHECK-NEXT: [[EL_PTR:%.]] = getelementptr i8, ptr [[P:%.]], i32 [[IV]]
				; CHECK-NEXT: [[EL:%.*]] = load i8, ptr [[EL_PTR]], align 4
				; CHECK-NEXT: [[BOUND_CHECK:%.*]] = icmp slt i8 [[EL]], 0
				; CHECK-NEXT: br i1 [[BOUND_CHECK]], label [[COMMON_RET:%.]], label [[GUARDED:%.]], !prof [[PROF4]]
				; CHECK: guarded:
				; CHECK-NEXT: [[EL_WIDE:%.*]] = zext i8 [[EL]] to i32
				; CHECK-NEXT: [[RANGE_CHECK:%.*]] = icmp ult i32 [[EL_WIDE]], [[LEN]]
				; CHECK-NEXT: br i1 [[RANGE_CHECK]], label [[BACKEDGE]], label [[COMMON_RET]], !prof [[PROF1]]
				; CHECK: backedge:
				; CHECK-NEXT: [[ARR_PTR:%.]] = getelementptr i32, ptr [[ARR:%.]], i32 [[EL_WIDE]]
				; CHECK-NEXT: store i32 [[IV]], ptr [[ARR_PTR]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], 1
				; CHECK-NEXT: [[LOOP_COND:%.]] = icmp slt i32 [[IV_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[COMMON_RET]]
				; CHECK: common.ret:
				; CHECK-NEXT: [[COMMON_RET_OP:%.*]] = phi i32 [ 0, [[BACKEDGE]] ], [ -1, [[LOOP]] ], [ -2, [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[COMMON_RET_OP]]
				;
				entry:
				%len = load i32, ptr %len_p, align 4, !noundef !{}
				br label %loop

				loop: ; preds = %backedge, %entry
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
				%el.ptr = getelementptr i8, ptr %p, i32 %iv
				%el = load i8, ptr %el.ptr, align 4
				%bound_check = icmp slt i8 %el, 0
				br i1 %bound_check, label %bound_check_failed, label %guarded, !prof !{!"branch_weights", i32 1, i32 100}

				guarded: ; preds = %loop
				%el.wide = zext i8 %el to i32
				%range_check = icmp ult i32 %el.wide, %len
				br i1 %range_check, label %backedge, label %range_check_failed, !prof !{!"branch_weights", i32 100, i32 1}

				backedge: ; preds = %guarded
				%arr.ptr = getelementptr i32, ptr %arr, i32 %el.wide
				store i32 %iv, ptr %arr.ptr, align 4
				%iv.next = add i32 %iv, 1
				%loop_cond = icmp slt i32 %iv.next, %n
				br i1 %loop_cond, label %loop, label %exit

				exit: ; preds = %backedge
				ret i32 0

				bound_check_failed: ; preds = %loop
				ret i32 -1

				range_check_failed: ; preds = %guarded
				ret i32 -2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when it's profitableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 496329

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll

[SimpleLoopUnswitch] Inject loop-invariant conditions and unswitch them when it's profitable
ClosedPublic