This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
2/11
SimpleLoopUnswitch.cpp
-
test/Transforms/SimpleLoopUnswitch/
-
Transforms/
-
SimpleLoopUnswitch/
-
divergent-nontrivial-unswitch.ll

Differential D109762

[NewPM][SimpleLoopUnswitch] Add DivergenceInfo
AcceptedPublic

Authored by bcahoon on Sep 14 2021, 7:44 AM.

Download Raw Diff

Details

Reviewers

sameerds
asbirlea
aeubanks
arsenm

Summary

Enable DivergenceAnalysis with non-trivial Simple Loop Unswitch.
Because DivergenceAnalysis is a function pass, it's not possible
to add the analysis to the pass manager pipeline when it's used
by a loop pass. This patch creates a new DivergenceInfo instance
only when needed by Simple Loop Unswitch. The PostDominatorTree
is needed as well. This adds DivergenceAnalysis when using the new
pass manager only and does not change the legacy pass manager.

Diff Detail

Unit TestsFailed

	Time	Test
	300 ms	x64 windows > Clang Tools.clang-tidy/checkers::readability-container-data-pointer.cpp

Event Timeline

bcahoon created this revision.Sep 14 2021, 7:44 AM

Herald added subscribers: kerbowa, hiraditya, nhaehnle, jvesely. · View Herald TranscriptSep 14 2021, 7:44 AM

bcahoon requested review of this revision.Sep 14 2021, 7:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 7:44 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B123843: Diff 372481.Sep 14 2021, 8:49 AM

lgtm but wait for other lgtms

llvm/test/Transforms/SimpleLoopUnswitch/AMDGPU/nontrivial-unswitch.ll
1 ↗	(On Diff #372481)	not a big deal, but we could just do `REQUIRES: amdgpu-registered-target` right? unless you're planning on adding more of these sorts of tests

This revision is now accepted and ready to land.Sep 14 2021, 9:49 AM

lgtm.

you should probably check that this doesn't affect compile times too much since we're creating some new (potentially expensive) analyses without any sort of caching

Added amdgpu-registered-target to test case.

arsenm added inline comments.Sep 14 2021, 6:41 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	Constructing your own PostDominatorTree seems not good. Should this be a pass dependency for divergent targets?

In D109762#2999963, @aeubanks wrote:

you should probably check that this doesn't affect compile times too much since we're creating some new (potentially expensive) analyses without any sort of caching

I've run a few test cases, but I don't know of a straightforward way to evaluate a large set of tests for compile time. It would be nice to be be able to cache the analysis.

bcahoon added inline comments.Sep 14 2021, 6:47 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	I agree that it's not good, but I believe the only way to do this as a dependency/required is to have all the loop passes update the post-dominator tree info when needed (which is what's done for dominators and several other loop analyses).

Harbormaster completed remote builds in B123943: Diff 372610.Sep 14 2021, 7:28 PM

aeubanks added inline comments.Sep 15 2021, 4:34 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	See the loop analyses part of https://llvm.org/docs/NewPassManager.html#using-analyses for limitations on accessing function analyses in a loop pass.

bcahoon added inline comments.Sep 16 2021, 9:59 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	Thanks for that reference. If I understand correctly, another approach to adding divergence analysis would be to add an entry to LoopStandardAnalysisResults for DivergenceAnalysisInfo (but, at least initially, without the guarantee that loop passes update it). That way it can be accessed by SimpleLoopUnswtich. Then, in PassBuilder.cpp add an instance of the DivergeranceAnalysis prior to SimpleLoopUnswitch. This also requires a call to createFunctionToLoopPassAdaptor between DivergenceAnalysis and SimpleLoopUnswitch. We could do this only for targets that have branch divergence so that the pass pipeline remains the same for other targets. This approach seems most similar to how the legacy pass manager works. My main hesitation is adding DivergenceAnalysis to LoopStandardAnalysisResults.

aeubanks added inline comments.Sep 16 2021, 10:49 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok also, adding DivergenceAnalysis to LoopStandardAnalysisResults doesn't seem right since expecting all loop passes to update DivergenceAnalysis doesn't seem right I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand. Part of the reason we don't allow this for other IR units is potential future concurrency and determinism, but I don't think we will ever do concurrency for loop passes (as in run loop passes on multiple loops in a function concurrently). I believe @asbirlea has mentioned something like this in the past, any thoughts? Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters?

bcahoon added inline comments.Sep 16 2021, 12:31 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters? Yes, I've been tracking down the source of some performance regressions that we're seeing after the switch from the legacy pass manager to the new pass manager. I've seen a couple instances where non-trivial loop unswitching does improve performance (though, in one instance, I've noticed that the threshold needs to increase in simple loop unswitch to recover performance).

sameerds added inline comments.Sep 16 2021, 7:17 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	Just to add some detail, here's one idea of "splitting
2803	I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok also, adding DivergenceAnalysis to LoopStandardAnalysisResults doesn't seem right since expecting all loop passes to update DivergenceAnalysis doesn't seem right I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand. Just FYI, here's one idea of how "splitting the loop pipeline" might look like if we attempted that: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148619.html

aeubanks added inline comments.Sep 17 2021, 4:25 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	I'm currently talking to some people to see if we can do some redesigning and allow loop passes to request function analyses on demand.

bcahoon added inline comments.Sep 19 2021, 6:31 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	Great to hear. Let me know if there is anything I can do to help. Do you think it's worthwhile to merge this patch now or wait until the outcome of the redesign?

aeubanks added inline comments.Sep 19 2021, 9:49 PM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2803	Merging this now is fine. (and still LGTM)

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

In D109762#3059988, @aeubanks wrote:

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Thanks for the update. I've waited to commit this patch because the motivating code that is improved by non-trivial unswitching requires additional changes (that are in the regular LoopUnswitch pass but not the SimpleLoopUnswitch pass).

In D109762#3059988, @aeubanks wrote:

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.

In D109762#3069261, @sameerds wrote:

In D109762#3059988, @aeubanks wrote:

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.

With this patch, we have to rerun DivergenceAnalysis every time we run the pass on every loop. If we change nontrivial unswitching into a function pass, we can upgrade that to only having to be rerun every time we actually unswitch something.
To be more specific, with the function pass, we'd have to invalidate most analyses anyway after every successful unswitch.

for every loop
  DA = FAM.getAnalysis<DivergenceAnalysis>(); // this will not rerun the analysis if we haven't invalidated below because we didn't successfully unswitch
  if successful unswitch
    auto PA = PreservedAnalyases::none();
    PA.preserve<SCEV>();
    // and all the other existing analyses we preserved
    FAM.invalidate(PA);

In D109762#3070676, @aeubanks wrote:
In D109762#3069261, @sameerds wrote:

In D109762#3059988, @aeubanks wrote:

I talked to some people and we've decided that the best thing to do would be to refactor out the nontrivial unswitching part into a function pass. Nontrivial unswitching is fairly special in the kinds of transforms it does.

Will that always work as expected? The real dependency is that the divergence analysis is not incrementally updated. So even if this is a function pass, we may want to rerun it on the whole function every time it manages to unswitch a loop.

With this patch, we have to rerun DivergenceAnalysis every time we run the pass on every loop. If we change nontrivial unswitching into a function pass, we can upgrade that to only having to be rerun every time we actually unswitch something.
To be more specific, with the function pass, we'd have to invalidate most analyses anyway after every successful unswitch.
for every loop
  DA = FAM.getAnalysis<DivergenceAnalysis>(); // this will not rerun the analysis if we haven't invalidated below because we didn't successfully unswitch
  if successful unswitch
    auto PA = PreservedAnalyases::none();
    PA.preserve<SCEV>();
    // and all the other existing analyses we preserved
    FAM.invalidate(PA);

There's also the option to update all analysis inside the new function pass that's doing non-trivial unswitching. This is currently the case with all other analyses (they are updated), so the only one that needs to be added is DA.
This limits the need to update DA to just the new Function pass, vs all Loop passes that are part of the same LPM as SimpleLoopUnswitch.
The alternative is, as you both already said, for DA to be recomputed when it is invalidated.

asbirlea mentioned this in D124251: [SimpleLoopUnswitch] Run LICM for nested unswitching tests..Apr 22 2022, 1:01 PM

drcut added a child revision: D128001: apply DivergenceAnalysis for SLU.Jun 21 2022, 10:37 AM

drcut removed a child revision: D128001: apply DivergenceAnalysis for SLU.

drcut mentioned this in D128001: apply DivergenceAnalysis for SLU.Jun 21 2022, 10:42 AM

Has this patch been merged into the main branch yet?

Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2022, 7:10 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

SimpleLoopUnswitch.cpp

36 lines

test/

Transforms/

SimpleLoopUnswitch/

divergent-nontrivial-unswitch.ll

73 lines

Diff 372610

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

Show All 12 Lines
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/GuardUtils.h"		#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopAnalysisManager.h"		#include "llvm/Analysis/LoopAnalysisManager.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/MustExecute.h"		#include "llvm/Analysis/MustExecute.h"
		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
▲ Show 20 Lines • Show All 2,634 Lines • ▼ Show 20 Lines	static int CalculateUnswitchCostMultiplier(
return CostMultiplier;		return CostMultiplier;
}		}

static bool unswitchBestCondition(		static bool unswitchBestCondition(
Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,		Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
AAResults &AA, TargetTransformInfo &TTI,		AAResults &AA, TargetTransformInfo &TTI,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,		function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
ScalarEvolution SE, MemorySSAUpdater MSSAU,		ScalarEvolution SE, MemorySSAUpdater MSSAU,
function_ref<void(Loop &, StringRef)> DestroyLoopCB) {		function_ref<void(Loop &, StringRef)> DestroyLoopCB,
		bool UseDivergenceInfo) {
// Collect all invariant conditions within this loop (as opposed to an inner		// Collect all invariant conditions within this loop (as opposed to an inner
// loop which would be handled when visiting that inner loop).		// loop which would be handled when visiting that inner loop).
SmallVector<std::pair<Instruction , TinyPtrVector<Value >>, 4>		SmallVector<std::pair<Instruction , TinyPtrVector<Value >>, 4>
UnswitchCandidates;		UnswitchCandidates;

// Whether or not we should also collect guards in the loop.		// Whether or not we should also collect guards in the loop.
bool CollectGuards = false;		bool CollectGuards = false;
if (UnswitchGuards) {		if (UnswitchGuards) {
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	for (auto *ExitBB : ExitBlocks) {
auto *I = ExitBB->getFirstNonPHI();		auto *I = ExitBB->getFirstNonPHI();
if (isa<CleanupPadInst>(I) \|\| isa<CatchSwitchInst>(I)) {		if (isa<CleanupPadInst>(I) \|\| isa<CatchSwitchInst>(I)) {
LLVM_DEBUG(dbgs() << "Cannot unswitch because of cleanuppad/catchswitch "		LLVM_DEBUG(dbgs() << "Cannot unswitch because of cleanuppad/catchswitch "
"in exit block\n");		"in exit block\n");
return false;		return false;
}		}
}		}

		if (UseDivergenceInfo) {
		Function *F = L.getHeader()->getParent();
		PostDominatorTree PDT(*F);
		arsenmUnsubmitted Not Done Reply Inline Actions Constructing your own PostDominatorTree seems not good. Should this be a pass dependency for divergent targets? arsenm: Constructing your own PostDominatorTree seems not good. Should this be a pass dependency for…
		bcahoonAuthorUnsubmitted Not Done Reply Inline Actions I agree that it's not good, but I believe the only way to do this as a dependency/required is to have all the loop passes update the post-dominator tree info when needed (which is what's done for dominators and several other loop analyses). bcahoon: I agree that it's not good, but I believe the only way to do this as a dependency/required is…
		aeubanksUnsubmitted Not Done Reply Inline Actions See the loop analyses part of https://llvm.org/docs/NewPassManager.html#using-analyses for limitations on accessing function analyses in a loop pass. aeubanks: See the loop analyses part of https://llvm.org/docs/NewPassManager.html#using-analyses for…
		bcahoonAuthorUnsubmitted Done Reply Inline Actions Thanks for that reference. If I understand correctly, another approach to adding divergence analysis would be to add an entry to LoopStandardAnalysisResults for DivergenceAnalysisInfo (but, at least initially, without the guarantee that loop passes update it). That way it can be accessed by SimpleLoopUnswtich. Then, in PassBuilder.cpp add an instance of the DivergeranceAnalysis prior to SimpleLoopUnswitch. This also requires a call to createFunctionToLoopPassAdaptor between DivergenceAnalysis and SimpleLoopUnswitch. We could do this only for targets that have branch divergence so that the pass pipeline remains the same for other targets. This approach seems most similar to how the legacy pass manager works. My main hesitation is adding DivergenceAnalysis to LoopStandardAnalysisResults. bcahoon: Thanks for that reference. If I understand correctly, another approach to adding divergence…
		aeubanksUnsubmitted Not Done Reply Inline Actions I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok also, adding DivergenceAnalysis to LoopStandardAnalysisResults doesn't seem right since expecting all loop passes to update DivergenceAnalysis doesn't seem right I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand. Part of the reason we don't allow this for other IR units is potential future concurrency and determinism, but I don't think we will ever do concurrency for loop passes (as in run loop passes on multiple loops in a function concurrently). I believe @asbirlea has mentioned something like this in the past, any thoughts? Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters? aeubanks: I don't think splitting up the loop pipeline to add in a function pass (to request a function…
		bcahoonAuthorUnsubmitted Done Reply Inline Actions Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching matters? Yes, I've been tracking down the source of some performance regressions that we're seeing after the switch from the legacy pass manager to the new pass manager. I've seen a couple instances where non-trivial loop unswitching does improve performance (though, in one instance, I've noticed that the threshold needs to increase in simple loop unswitch to recover performance). bcahoon: > Just curious, are you actually seeing benchmarks where this nontrivial loop unswitching…
		sameerdsUnsubmitted Not Done Reply Inline Actions Just to add some detail, here's one idea of "splitting sameerds: Just to add some detail, here's one idea of "splitting
		sameerdsUnsubmitted Not Done Reply Inline Actions I don't think splitting up the loop pipeline to add in a function pass (to request a function analysis) is ok also, adding DivergenceAnalysis to LoopStandardAnalysisResults doesn't seem right since expecting all loop passes to update DivergenceAnalysis doesn't seem right I think the proper way is to relax the constraint that a loop pass can't request a function analysis on demand. Just FYI, here's one idea of how "splitting the loop pipeline" might look like if we attempted that: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148619.html sameerds: > I don't think splitting up the loop pipeline to add in a function pass (to request a function…
		aeubanksUnsubmitted Not Done Reply Inline Actions I'm currently talking to some people to see if we can do some redesigning and allow loop passes to request function analyses on demand. aeubanks: I'm currently talking to some people to see if we can do some redesigning and allow loop passes…
		bcahoonAuthorUnsubmitted Not Done Reply Inline Actions Great to hear. Let me know if there is anything I can do to help. Do you think it's worthwhile to merge this patch now or wait until the outcome of the redesign? bcahoon: Great to hear. Let me know if there is anything I can do to help. Do you think it's worthwhile…
		aeubanksUnsubmitted Not Done Reply Inline Actions Merging this now is fine. (and still LGTM) aeubanks: Merging this now is fine. (and still LGTM)
		DivergenceInfo DI(F, DT, PDT, LI, TTI, /KnownReducible*/ true);
		llvm::erase_if(UnswitchCandidates,
		[&](std::pair<Instruction , TinyPtrVector<Value >> Cand) {
		return DI.isDivergent(*Cand.second[0]);
		});
		if (UnswitchCandidates.empty())
		return false;
		}

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Considering " << UnswitchCandidates.size()		dbgs() << "Considering " << UnswitchCandidates.size()
<< " non-trivial loop invariant conditions for unswitching.\n");		<< " non-trivial loop invariant conditions for unswitching.\n");

// Given that unswitching these terminators will require duplicating parts of		// Given that unswitching these terminators will require duplicating parts of
// the loop, so we need to be able to model that cost. Compute the ephemeral		// the loop, so we need to be able to model that cost. Compute the ephemeral
// values and set up a data structure to hold per-BB costs. We cache each		// values and set up a data structure to hold per-BB costs. We cache each
// block's cost so that we don't recompute this when considering different		// block's cost so that we don't recompute this when considering different
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
/// If `SE` is non-null, we will update that analysis based on the unswitching		/// If `SE` is non-null, we will update that analysis based on the unswitching
/// done.		/// done.
static bool		static bool
unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,		unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
AAResults &AA, TargetTransformInfo &TTI, bool Trivial,		AAResults &AA, TargetTransformInfo &TTI, bool Trivial,
bool NonTrivial,		bool NonTrivial,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,		function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
ScalarEvolution SE, MemorySSAUpdater MSSAU,		ScalarEvolution SE, MemorySSAUpdater MSSAU,
function_ref<void(Loop &, StringRef)> DestroyLoopCB) {		function_ref<void(Loop &, StringRef)> DestroyLoopCB,
		bool UseDivergenceInfo) {
assert(L.isRecursivelyLCSSAForm(DT, LI) &&		assert(L.isRecursivelyLCSSAForm(DT, LI) &&
"Loops must be in LCSSA form before unswitching.");		"Loops must be in LCSSA form before unswitching.");

// Must be in loop simplified form: we need a preheader and dedicated exits.		// Must be in loop simplified form: we need a preheader and dedicated exits.
if (!L.isLoopSimplifyForm())		if (!L.isLoopSimplifyForm())
return false;		return false;

// Try trivial unswitch first before loop over other basic blocks in the loop.		// Try trivial unswitch first before loop over other basic blocks in the loop.
if (Trivial && unswitchAllTrivialConditions(L, DT, LI, SE, MSSAU)) {		if (Trivial && unswitchAllTrivialConditions(L, DT, LI, SE, MSSAU)) {
// If we unswitched successfully we will want to clean up the loop before		// If we unswitched successfully we will want to clean up the loop before
// processing it further so just mark it as unswitched and return.		// processing it further so just mark it as unswitched and return.
UnswitchCB(/CurrentLoopValid/ true, false, {});		UnswitchCB(/CurrentLoopValid/ true, false, {});
return true;		return true;
}		}

// Check whether we should continue with non-trivial conditions.		// Check whether we should continue with non-trivial conditions.
// EnableNonTrivialUnswitch: Global variable that forces non-trivial		// EnableNonTrivialUnswitch: Global variable that forces non-trivial
// unswitching for testing and debugging.		// unswitching for testing and debugging.
// NonTrivial: Parameter that enables non-trivial unswitching for this		// NonTrivial: Parameter that enables non-trivial unswitching for this
// invocation of the transform. But this should be allowed only		// invocation of the transform. But this should be allowed only
// for targets without branch divergence.		// for targets without branch divergence.
//		if (!EnableNonTrivialUnswitch && !NonTrivial)
// FIXME: If divergence analysis becomes available to a loop
// transform, we should allow unswitching for non-trivial uniform
// branches even on targets that have divergence.
// https://bugs.llvm.org/show_bug.cgi?id=48819
bool ContinueWithNonTrivial =
EnableNonTrivialUnswitch \|\| (NonTrivial && !TTI.hasBranchDivergence());
if (!ContinueWithNonTrivial)
return false;		return false;

// Skip non-trivial unswitching for optsize functions.		// Skip non-trivial unswitching for optsize functions.
if (L.getHeader()->getParent()->hasOptSize())		if (L.getHeader()->getParent()->hasOptSize())
return false;		return false;

// Skip non-trivial unswitching for loops that cannot be cloned.		// Skip non-trivial unswitching for loops that cannot be cloned.
if (!L.isSafeToClone())		if (!L.isSafeToClone())
return false;		return false;

// For non-trivial unswitching, because it often creates new loops, we rely on		// For non-trivial unswitching, because it often creates new loops, we rely on
// the pass manager to iterate on the loops rather than trying to immediately		// the pass manager to iterate on the loops rather than trying to immediately
// reach a fixed point. There is no substantial advantage to iterating		// reach a fixed point. There is no substantial advantage to iterating
// internally, and if any of the new loops are simplified enough to contain		// internally, and if any of the new loops are simplified enough to contain
// trivial unswitching we want to prefer those.		// trivial unswitching we want to prefer those.

// Try to unswitch the best invariant condition. We prefer this full unswitch to		// Try to unswitch the best invariant condition. We prefer this full unswitch to
// a partial unswitch when possible below the threshold.		// a partial unswitch when possible below the threshold.
if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU,		if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU,
DestroyLoopCB))		DestroyLoopCB, UseDivergenceInfo))
return true;		return true;

// No other opportunities to unswitch.		// No other opportunities to unswitch.
return false;		return false;
}		}

PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
if (AR.MSSA) {		if (AR.MSSA) {
MSSAU = MemorySSAUpdater(AR.MSSA);		MSSAU = MemorySSAUpdater(AR.MSSA);
if (VerifyMemorySSA)		if (VerifyMemorySSA)
AR.MSSA->verifyMemorySSA();		AR.MSSA->verifyMemorySSA();
}		}
if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial,		if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial,
UnswitchCB, &AR.SE,		UnswitchCB, &AR.SE,
MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,		MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,
DestroyLoopCB))		DestroyLoopCB, AR.TTI.useGPUDivergenceAnalysis()))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

if (AR.MSSA && VerifyMemorySSA)		if (AR.MSSA && VerifyMemorySSA)
AR.MSSA->verifyMemorySSA();		AR.MSSA->verifyMemorySSA();

// Historically this pass has had issues with the dominator tree so verify it		// Historically this pass has had issues with the dominator tree so verify it
// in asserts builds.		// in asserts builds.
assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));		assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) {		auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) {
LPM.markLoopAsDeleted(L);		LPM.markLoopAsDeleted(L);
};		};

if (VerifyMemorySSA)		if (VerifyMemorySSA)
MSSA->verifyMemorySSA();		MSSA->verifyMemorySSA();

bool Changed = unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial,		bool Changed = unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial,
UnswitchCB, SE, &MSSAU, DestroyLoopCB);		UnswitchCB, SE, &MSSAU, DestroyLoopCB,
		/UseDivergenceInfo/ false);

if (VerifyMemorySSA)		if (VerifyMemorySSA)
MSSA->verifyMemorySSA();		MSSA->verifyMemorySSA();

// Historically this pass has had issues with the dominator tree so verify it		// Historically this pass has had issues with the dominator tree so verify it
// in asserts builds.		// in asserts builds.
assert(DT.verify(DominatorTree::VerificationLevel::Fast));		assert(DT.verify(DominatorTree::VerificationLevel::Fast));

Show All 18 Lines

llvm/test/Transforms/SimpleLoopUnswitch/divergent-nontrivial-unswitch.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-unknown-amdhsa -passes='loop(simple-loop-unswitch<nontrivial>),verify<loops>' -S < %s \| FileCheck %s
				; REQUIRES: amdgpu-registered-target

				; Check that non-trivial loop unswitch occurs on a target with divergence.

				; CHECK-LABEL: @nontrivial_unswitch(
				; CHECK: for.body.us:
				; CHECK: if.then.us:
				; CHECK: for.inc.us:
				; CHECK: for.body:
				; CHECK: for.inc:
				define amdgpu_kernel void @nontrivial_unswitch(i32 * nocapture %out, i32 %n, i1 %cond) {
				entry:
				br label %for.body

				for.body:
				%i = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
				br i1 %cond, label %if.then, label %for.inc

				if.then:
				%arrayidx = getelementptr inbounds i32, i32 * %out, i32 %i
				store i32 %i, i32 * %arrayidx, align 4
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i32 %i, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body

				for.cond.cleanup.loopexit:
				ret void
				}


				; Check that loop unswitch does not happen if the condition is divergent.

				; CHECK-LABEL: @divergent_unswitch(
				; CHECK: entry:
				; CHECK: [[IF_COND:%[a-z0-9]+]] = icmp {{.*}}, 1
				; CHECK: br label
				; CHECK: for.body:
				; CHECK: br i1 [[IF_COND]]

				define amdgpu_kernel void @divergent_unswitch(i32 * nocapture %out, i32 %n) {
				entry:
				br label %for.body.lr.ph

				for.body.lr.ph:
				%call = tail call i32 @llvm.amdgcn.workitem.id.x() #0
				%cmp2 = icmp eq i32 %call, 1
				br label %for.body

				for.body:
				%i = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
				br i1 %cmp2, label %if.then, label %for.inc

				if.then:
				%arrayidx = getelementptr inbounds i32, i32 * %out, i32 %i
				store i32 %i, i32 * %arrayidx, align 4
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i32 %i, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body

				for.cond.cleanup.loopexit:
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #0

				attributes #0 = { nounwind readnone }