This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
PassManager.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopPassManager.cpp
1/2
SimpleLoopUnswitch.cpp
-
test/Transforms/LoopUnswitch/AMDGPU/
-
Transforms/
-
LoopUnswitch/
-
AMDGPU/
-
divergent-unswitch.ll

Differential D96754

[NewPM] Use stale divergence analysis with SimpleLoopUnswitch
AbandonedPublic

Authored by sameerds on Feb 15 2021, 11:57 PM.

Download Raw Diff

Details

Reviewers

foad
arsenm
rampitec
aeubanks
tra
asbirlea

Summary

This fixes bug 48819.

Loop unswitching on divergent conditions is harmful for
performance. The LoopUnswitch pass depends on LegacyDivergenceAnalysis
to avoid this, but the state of divergence analysis may be
stale (neither preserved nor invalidated) due to previous loop passes.

The new pass manager provides SimpleLoopUnswitch which currently does
not skip divergent branches. Loop passes can request function analysis
results from an "outer proxy" analysis manager, but only if such
results are never invalidated. This change introduces another method
to request an analysis from the outer proxy even if it is stale. This
is sufficient for the current use-case, where it is not necessary to
update the divergence analysis after every loop pass, and the existing
stale result is still safely useable. The effect is equivalent to the
use of divergence analysis by LoopUnswitch in the legacy pass manager.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sameerds created this revision.Feb 15 2021, 11:57 PM

Herald added subscribers: dexonsmith, kerbowa, hiraditya and 3 others. · View Herald TranscriptFeb 15 2021, 11:57 PM

sameerds requested review of this revision.Feb 15 2021, 11:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2021, 11:57 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sameerds added reviewers: foad, arsenm, rampitec, aeubanks, tra, asbirlea.Feb 16 2021, 12:11 AM

Herald added a subscriber: wdng. · View Herald TranscriptFeb 16 2021, 12:11 AM

Harbormaster completed remote builds in B89322: Diff 323895.Feb 16 2021, 12:31 AM

Not to say that I am thrilled, but it should do the job... likely.

tra added inline comments.Feb 16 2021, 10:51 AM

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2886–2891	Could you elaborate on why it is always safe? IIUIC, the idea is that by default we'll treat all branches as divergent. The DA will allow treating some branches as non-divergent. Any new branches created transforms will not be included in the stale DA and therefore will be treated as divergent, Perhaps `BaselineDA` would be a better name? Yes, it is potentially stale and that needs to be prominently displayed, but being stale is not particularly descriptive of the parameter.

This is absolutely not the right way resolve this.
First, the restriction to not allow getting a stale analysis is very much intentional and part of the design of the new pass manager. The API being added here must not exist.
Second, it is not safe to use the stale DA. LoopUnswitch gets the LegacyDivergenceAnalysis only when making the final unswitching decision, and it does not reuse a stale instance. The same needs to happen in SimpleLoopUnswitch.
A proper solution is to change the divergence analysis pass so it can be created as an object. An example of a pass used as an object is the OptimizationRemarkEmitter.

This revision now requires changes to proceed.Feb 16 2021, 12:13 PM

In D96754#2566461, @asbirlea wrote:

This is absolutely not the right way resolve this.
First, the restriction to not allow getting a stale analysis is very much intentional and part of the design of the new pass manager. The API being added here must not exist.
Second, it is not safe to use the stale DA.

Please see other comments for the specific use-case. It is safe for what loop unswitching does. Maybe the use of the word "stale" provokes a conservative reaction. It might be more useful to think in terms of an analysis being used as a hint rather than a reliable truth.

LoopUnswitch gets the LegacyDivergenceAnalysis only when making the final unswitching decision, and it does not reuse a stale instance. The same needs to happen in SimpleLoopUnswitch.

That is not true. The flow in the legacy pass manager is more involved. The function LoopPass::preparePassManager() actually makes sure that if a loop transform T running inside a loop pass manager M invalidates analyses used by other passes in M, then T is split out into a separate loop pass manager M'. Thus every instance of loop pass manager is responsible for making sure that the used analyses are recomputed before starting any passes. The divergence analysis is not computed in the middle of loop unswitching like you seem to be suggesting here.

But now that I looked at it more closely, this does invalidate my original assumption that the legacy PM is providing a stale analysis. In that sense, my patch should not be seen as reproducing existing behaviour ... it's turning out to be more of a hack because the new PM lacks some functionality. There doesn't seem to be a way for a transform in the new PM to isolate itself from the effects of other transforms that invalidate analyses in the outer analysis manager.

A proper solution is to change the divergence analysis pass so it can be created as an object. An example of a pass used as an object is the OptimizationRemarkEmitter.

The DA is already available as a standalone object. But then if this was a proper solution, then that would undermine the utility of the whole analysis manager framework. Is it really a good idea to build a framework that is not flexible enough and then advise people to stay out of it if their use-case is not covered? Recomputing a function analysis on every loop seems unnecessarily costly when a stale analysis would have been good enough.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2886–2891	When you unswitch a divergent branch, you're simply splitting the warp into two active masks, which will separately follow the corresponding side of the branch. This does not affect the execution of the threads themselves. But it does force a typical GPU to serialize execution ... while threads in one active mask are executing their side, the others must remain inactive and vice versa. This usually has lower performance than having all the threads go through the single original loop including a divergent branch. I can imagine `StaleDA` is a bit negative, but `BaselineDA` also does not reflect the fact that it is unreliable. How about `DivergenceHint`? That brings out the utility of the result while indicating that the results are not entirely reliable.

In D96754#2567456, @sameerds wrote:

That is not true. The flow in the legacy pass manager is more involved. The function LoopPass::preparePassManager() actually makes sure that if a loop transform T running inside a loop pass manager M invalidates analyses used by other passes in M, then T is split out into a separate loop pass manager M'. Thus every instance of loop pass manager is responsible for making sure that the used analyses are recomputed before starting any passes. The divergence analysis is not computed in the middle of loop unswitching like you seem to be suggesting here.

But now that I looked at it more closely, this does invalidate my original assumption that the legacy PM is providing a stale analysis. In that sense, my patch should not be seen as reproducing existing behaviour ... it's turning out to be more of a hack because the new PM lacks some functionality. There doesn't seem to be a way for a transform in the new PM to isolate itself from the effects of other transforms that invalidate analyses in the outer analysis manager.

I can think of a static solution for this, which won't be as flexible as the old PM, but it kinda fits in the general scheme of things for the new PM.

Any loop pass that depends on an analysis other than the standard loop analyses should return a list in a static method.
Rather than adding passes to an LPM directly, an outer manager should add them through a proxy. This proxy can own more than one LPMs and has the ability to enqueue function analyses between these LPMs.
Whenever addPass() is called on this proxy, it should check the loop pass to see if it requires any non-standard analyses. If yes, it should start a new LPM and enqueue the required analyses before the new LPM

I am not sure if any existing class can serve as this proxy, or we need to add something new.

This needs a deeper design. For now, the quick way forward is to disable loop unswitching on divergent non-trivial branches:

https://reviews.llvm.org/D98958

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

PassManager.h

7 lines

lib/

Transforms/

Scalar/

LoopPassManager.cpp

8 lines

SimpleLoopUnswitch.cpp

43 lines

test/

Transforms/

LoopUnswitch/

AMDGPU/

divergent-unswitch.ll

6 lines

Diff 323895

llvm/include/llvm/IR/PassManager.h

Show First 20 Lines • Show All 1,105 Lines • ▼ Show 20 Lines	public:
typename PassT::Result *getCachedResult(IRUnitTParam &IR) const {		typename PassT::Result *getCachedResult(IRUnitTParam &IR) const {
typename PassT::Result *Res =		typename PassT::Result *Res =
OuterAM->template getCachedResult<PassT>(IR);		OuterAM->template getCachedResult<PassT>(IR);
if (Res)		if (Res)
OuterAM->template verifyNotInvalidated<PassT>(IR, Res);		OuterAM->template verifyNotInvalidated<PassT>(IR, Res);
return Res;		return Res;
}		}

		/// Get a cached analysis. This may be stale due to inner
		/// transforms, but the caller is okay with that.
		template <typename PassT, typename IRUnitTParam>
		typename PassT::Result *getStaleResult(IRUnitTParam &IR) const {
		return OuterAM->template getCachedResult<PassT>(IR);
		}

/// Method provided for unit testing, not intended for general use.		/// Method provided for unit testing, not intended for general use.
template <typename PassT, typename IRUnitTParam>		template <typename PassT, typename IRUnitTParam>
bool cachedResultExists(IRUnitTParam &IR) const {		bool cachedResultExists(IRUnitTParam &IR) const {
typename PassT::Result *Res =		typename PassT::Result *Res =
OuterAM->template getCachedResult<PassT>(IR);		OuterAM->template getCachedResult<PassT>(IR);
return Res != nullptr;		return Res != nullptr;
}		}

▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopPassManager.cpp

//===- LoopPassManager.cpp - Loop pass management -------------------------===//		//===- LoopPassManager.cpp - Loop pass management -------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/LoopPassManager.h"		#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Support/TimeProfiler.h"		#include "llvm/Support/TimeProfiler.h"

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	LoopStandardAnalysisResults LAR = {AM.getResult<AAManager>(F),
AM.getResult<AssumptionAnalysis>(F),		AM.getResult<AssumptionAnalysis>(F),
AM.getResult<DominatorTreeAnalysis>(F),		AM.getResult<DominatorTreeAnalysis>(F),
AM.getResult<LoopAnalysis>(F),		AM.getResult<LoopAnalysis>(F),
AM.getResult<ScalarEvolutionAnalysis>(F),		AM.getResult<ScalarEvolutionAnalysis>(F),
AM.getResult<TargetLibraryAnalysis>(F),		AM.getResult<TargetLibraryAnalysis>(F),
AM.getResult<TargetIRAnalysis>(F),		AM.getResult<TargetIRAnalysis>(F),
BFI,		BFI,
MSSA};		MSSA};
		// Divergence analysis is neither invalidated nor preserved by loop
		// transforms. But it can be useful even when stale in some loop
		// transforms. We make sure it is computed before starting the loop
		// passes. For example, see the use of getStaleResult() in
		// SimpleLoopUnswitch.
		if (LAR.TTI.hasBranchDivergence())
		(void)AM.getResult<DivergenceAnalysis>(F);

// Setup the loop analysis manager from its proxy. It is important that		// Setup the loop analysis manager from its proxy. It is important that
// this is only done when there are loops to process and we have built the		// this is only done when there are loops to process and we have built the
// LoopStandardAnalysisResults object. The loop analyses cached in this		// LoopStandardAnalysisResults object. The loop analyses cached in this
// manager have access to those analysis results and so it must invalidate		// manager have access to those analysis results and so it must invalidate
// itself when they go away.		// itself when they go away.
auto &LAMFP = AM.getResult<LoopAnalysisManagerFunctionProxy>(F);		auto &LAMFP = AM.getResult<LoopAnalysisManagerFunctionProxy>(F);
if (UseMemorySSA)		if (UseMemorySSA)
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

Show All 12 Lines
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/GuardUtils.h"		#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopAnalysisManager.h"		#include "llvm/Analysis/LoopAnalysisManager.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
▲ Show 20 Lines • Show All 2,566 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << " Computed multiplier " << CostMultiplier
<< (1 << ClonesPower) << ")"		<< (1 << ClonesPower) << ")"
<< " for unswitch candidate: " << TI << "\n");		<< " for unswitch candidate: " << TI << "\n");
return CostMultiplier;		return CostMultiplier;
}		}

static bool		static bool
unswitchBestCondition(Loop &L, DominatorTree &DT, LoopInfo &LI,		unswitchBestCondition(Loop &L, DominatorTree &DT, LoopInfo &LI,
AssumptionCache &AC, TargetTransformInfo &TTI,		AssumptionCache &AC, TargetTransformInfo &TTI,
		const DivergenceInfo *StaleDA,
function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB,		function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB,
ScalarEvolution SE, MemorySSAUpdater MSSAU) {		ScalarEvolution SE, MemorySSAUpdater MSSAU) {
// Collect all invariant conditions within this loop (as opposed to an inner		// Collect all invariant conditions within this loop (as opposed to an inner
// loop which would be handled when visiting that inner loop).		// loop which would be handled when visiting that inner loop).
SmallVector<std::pair<Instruction , TinyPtrVector<Value >>, 4>		using CandidateType = std::pair<Instruction , TinyPtrVector<Value >>;
UnswitchCandidates;		SmallVector<CandidateType, 4> UnswitchCandidates;

// Whether or not we should also collect guards in the loop.		// Whether or not we should also collect guards in the loop.
bool CollectGuards = false;		bool CollectGuards = false;
if (UnswitchGuards) {		if (UnswitchGuards) {
auto *GuardDecl = L.getHeader()->getParent()->getParent()->getFunction(		auto *GuardDecl = L.getHeader()->getParent()->getParent()->getFunction(
Intrinsic::getName(Intrinsic::experimental_guard));		Intrinsic::getName(Intrinsic::experimental_guard));
if (GuardDecl && !GuardDecl->use_empty())		if (GuardDecl && !GuardDecl->use_empty())
CollectGuards = true;		CollectGuards = true;
Show All 39 Lines	for (auto *BB : L.blocks()) {
TinyPtrVector<Value *> Invariants =		TinyPtrVector<Value *> Invariants =
collectHomogenousInstGraphLoopInvariants(L, CondI, LI);		collectHomogenousInstGraphLoopInvariants(L, CondI, LI);
if (Invariants.empty())		if (Invariants.empty())
continue;		continue;

UnswitchCandidates.push_back({BI, std::move(Invariants)});		UnswitchCandidates.push_back({BI, std::move(Invariants)});
}		}

		if (TTI.hasBranchDivergence()) {
		llvm::erase_if(UnswitchCandidates, [&](const CandidateType &C) {
		if (!StaleDA \|\| StaleDA->isDivergent(*C.first)) {
		LLVM_DEBUG(dbgs() << " Skip divergent candidate: " << *C.first
		<< "\n");
		return true;
		}
		return false;
		});
		}

// If we didn't find any candidates, we're done.		// If we didn't find any candidates, we're done.
if (UnswitchCandidates.empty())		if (UnswitchCandidates.empty())
return false;		return false;

// Check if there are irreducible CFG cycles in this loop. If so, we cannot		// Check if there are irreducible CFG cycles in this loop. If so, we cannot
// easily unswitch non-trivial edges out of the loop. Doing so might turn the		// easily unswitch non-trivial edges out of the loop. Doing so might turn the
// irreducible control flow into reducible control flow and introduce new		// irreducible control flow into reducible control flow and introduce new
// loops "out of thin air". If we ever discover important use cases for doing		// loops "out of thin air". If we ever discover important use cases for doing
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
/// require duplicating any part of the loop) out of the loop body. It then		/// require duplicating any part of the loop) out of the loop body. It then
/// looks at other loop invariant control flows and tries to unswitch those as		/// looks at other loop invariant control flows and tries to unswitch those as
/// well by cloning the loop if the result is small enough.		/// well by cloning the loop if the result is small enough.
///		///
/// The `DT`, `LI`, `AC`, `TTI` parameters are required analyses that are also		/// The `DT`, `LI`, `AC`, `TTI` parameters are required analyses that are also
/// updated based on the unswitch.		/// updated based on the unswitch.
/// The `MSSA` analysis is also updated if valid (i.e. its use is enabled).		/// The `MSSA` analysis is also updated if valid (i.e. its use is enabled).
///		///
		/// The `StaleDA` analysis is useful for skipping divergent branches
		/// if it is available: unswitching such a branch is expensive on
		/// targets that have divergence. The analysis is stale since other
		/// loop transforms neither preserve nor update it, but it is safe for
		/// skipping branches.
		///
		traUnsubmitted Not Done Reply Inline Actions Could you elaborate on why it is always safe? IIUIC, the idea is that by default we'll treat all branches as divergent. The DA will allow treating some branches as non-divergent. Any new branches created transforms will not be included in the stale DA and therefore will be treated as divergent, Perhaps `BaselineDA` would be a better name? Yes, it is potentially stale and that needs to be prominently displayed, but being stale is not particularly descriptive of the parameter. tra: Could you elaborate on why it is always safe? IIUIC, the idea is that by default we'll treat…
		sameerdsAuthorUnsubmitted Done Reply Inline Actions When you unswitch a divergent branch, you're simply splitting the warp into two active masks, which will separately follow the corresponding side of the branch. This does not affect the execution of the threads themselves. But it does force a typical GPU to serialize execution ... while threads in one active mask are executing their side, the others must remain inactive and vice versa. This usually has lower performance than having all the threads go through the single original loop including a divergent branch. I can imagine `StaleDA` is a bit negative, but `BaselineDA` also does not reflect the fact that it is unreliable. How about `DivergenceHint`? That brings out the utility of the result while indicating that the results are not entirely reliable. sameerds: When you unswitch a divergent branch, you're simply splitting the warp into two active masks…
/// If either `NonTrivial` is true or the flag `EnableNonTrivialUnswitch` is		/// If either `NonTrivial` is true or the flag `EnableNonTrivialUnswitch` is
/// true, we will attempt to do non-trivial unswitching as well as trivial		/// true, we will attempt to do non-trivial unswitching as well as trivial
/// unswitching.		/// unswitching.
///		///
/// The `UnswitchCB` callback provided will be run after unswitching is		/// The `UnswitchCB` callback provided will be run after unswitching is
/// complete, with the first parameter set to `true` if the provided loop		/// complete, with the first parameter set to `true` if the provided loop
/// remains a loop, and a list of new sibling loops created.		/// remains a loop, and a list of new sibling loops created.
///		///
/// If `SE` is non-null, we will update that analysis based on the unswitching		/// If `SE` is non-null, we will update that analysis based on the unswitching
/// done.		/// done.
static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,		static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
AssumptionCache &AC, TargetTransformInfo &TTI,		AssumptionCache &AC, TargetTransformInfo &TTI,
bool NonTrivial,		const DivergenceInfo *StaleDA, bool NonTrivial,
function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB,		function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB,
ScalarEvolution SE, MemorySSAUpdater MSSAU) {		ScalarEvolution SE, MemorySSAUpdater MSSAU) {
assert(L.isRecursivelyLCSSAForm(DT, LI) &&		assert(L.isRecursivelyLCSSAForm(DT, LI) &&
"Loops must be in LCSSA form before unswitching.");		"Loops must be in LCSSA form before unswitching.");

// Must be in loop simplified form: we need a preheader and dedicated exits.		// Must be in loop simplified form: we need a preheader and dedicated exits.
if (!L.isLoopSimplifyForm())		if (!L.isLoopSimplifyForm())
return false;		return false;
Show All 23 Lines	static bool unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
// For non-trivial unswitching, because it often creates new loops, we rely on		// For non-trivial unswitching, because it often creates new loops, we rely on
// the pass manager to iterate on the loops rather than trying to immediately		// the pass manager to iterate on the loops rather than trying to immediately
// reach a fixed point. There is no substantial advantage to iterating		// reach a fixed point. There is no substantial advantage to iterating
// internally, and if any of the new loops are simplified enough to contain		// internally, and if any of the new loops are simplified enough to contain
// trivial unswitching we want to prefer those.		// trivial unswitching we want to prefer those.

// Try to unswitch the best invariant condition. We prefer this full unswitch to		// Try to unswitch the best invariant condition. We prefer this full unswitch to
// a partial unswitch when possible below the threshold.		// a partial unswitch when possible below the threshold.
if (unswitchBestCondition(L, DT, LI, AC, TTI, UnswitchCB, SE, MSSAU))		if (unswitchBestCondition(L, DT, LI, AC, TTI, StaleDA, UnswitchCB, SE, MSSAU))
return true;		return true;

// No other opportunities to unswitch.		// No other opportunities to unswitch.
return false;		return false;
}		}

PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
Show All 23 Lines	PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
};		};

Optional<MemorySSAUpdater> MSSAU;		Optional<MemorySSAUpdater> MSSAU;
if (AR.MSSA) {		if (AR.MSSA) {
MSSAU = MemorySSAUpdater(AR.MSSA);		MSSAU = MemorySSAUpdater(AR.MSSA);
if (VerifyMemorySSA)		if (VerifyMemorySSA)
AR.MSSA->verifyMemorySSA();		AR.MSSA->verifyMemorySSA();
}		}
if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.TTI, NonTrivial, UnswitchCB,
&AR.SE, MSSAU.hasValue() ? MSSAU.getPointer() : nullptr))		const DivergenceInfo *StaleDA = nullptr;
		if (AR.TTI.hasBranchDivergence()) {
		auto &FAM = AM.getResult<FunctionAnalysisManagerLoopProxy>(L, AR);
		StaleDA = FAM.getStaleResult<DivergenceAnalysis>(F);
		assert(StaleDA);
		}

		if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.TTI, StaleDA, NonTrivial,
		UnswitchCB, &AR.SE,
		MSSAU.hasValue() ? MSSAU.getPointer() : nullptr))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

if (AR.MSSA && VerifyMemorySSA)		if (AR.MSSA && VerifyMemorySSA)
AR.MSSA->verifyMemorySSA();		AR.MSSA->verifyMemorySSA();

// Historically this pass has had issues with the dominator tree so verify it		// Historically this pass has had issues with the dominator tree so verify it
// in asserts builds.		// in asserts builds.
assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));		assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (CurrentLoopValid)
LPM.addLoop(*L);		LPM.addLoop(*L);
else		else
LPM.markLoopAsDeleted(*L);		LPM.markLoopAsDeleted(*L);
};		};

if (MSSA && VerifyMemorySSA)		if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();		MSSA->verifyMemorySSA();

bool Changed = unswitchLoop(*L, DT, LI, AC, TTI, NonTrivial, UnswitchCB, SE,		bool Changed = unswitchLoop(L, DT, LI, AC, TTI, / StaleDA = */ nullptr,
		NonTrivial, UnswitchCB, SE,
MSSAU.hasValue() ? MSSAU.getPointer() : nullptr);		MSSAU.hasValue() ? MSSAU.getPointer() : nullptr);

if (MSSA && VerifyMemorySSA)		if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();		MSSA->verifyMemorySSA();

// Historically this pass has had issues with the dominator tree so verify it		// Historically this pass has had issues with the dominator tree so verify it
// in asserts builds.		// in asserts builds.
assert(DT.verify(DominatorTree::VerificationLevel::Fast));		assert(DT.verify(DominatorTree::VerificationLevel::Fast));
Show All 19 Lines

llvm/test/Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll

	; RUN: opt -mtriple=amdgcn-- -O3 -S -enable-new-pm=0 %s \| FileCheck %s			; RUN: opt -mtriple=amdgcn-- -O3 -S %s \| FileCheck %s
				; RUN: opt -mtriple=amdgcn-- -simple-loop-unswitch -enable-nontrivial-unswitch -simplifycfg -loop-deletion -simplifycfg -S %s \| FileCheck %s
	; This fails with the new pass manager:
	; https://bugs.llvm.org/show_bug.cgi?id=48819

	; Check that loop unswitch happened and condition hoisted out of the loop.			; Check that loop unswitch happened and condition hoisted out of the loop.
	; Condition is uniform so all targets should perform unswitching.			; Condition is uniform so all targets should perform unswitching.

	; CHECK-LABEL: {{^}}define amdgpu_kernel void @uniform_unswitch			; CHECK-LABEL: {{^}}define amdgpu_kernel void @uniform_unswitch
	; CHECK: entry:			; CHECK: entry:
	; CHECK-NEXT: [[LOOP_COND:%[a-z0-9]+]] = icmp			; CHECK-NEXT: [[LOOP_COND:%[a-z0-9]+]] = icmp
	; CHECK-NEXT: [[IF_COND:%[a-z0-9]+]] = icmp eq i32 %x, 123456			; CHECK-NEXT: [[IF_COND:%[a-z0-9]+]] = icmp eq i32 %x, 123456
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines