This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Analysis/DivergenceAnalysis.cpp
357–360	I think it would be less confusing to put the `!DA` check before the `ContainsIrreducible` check. I realise you have initialised ContainsIrreducible to false so that this still works, even for an "empty" analysis with no DA or LegacyDA, but that seems like a lie since we don't actually know whether the function constains irreducible regions or not. I would hope for a structure like: if (LegacyDA) return (something based on LegacyDA); if (DA) return (something based on DA); return false; Same for the other functions below.
llvm/lib/Transforms/Scalar/Sink.cpp
247	It seems odd that you explicitly choose the new DA here but the legacy DA in SinkingLegacyPass. I know they both have "legacy" in their name but that doesn't seem like a good reason. SinkingLegacyPass is just a wrapper for use with the legacy pass manager, but I don't see why it should necessarily use the legacy DA. Or is there some problem like the legacy DA doesn't work with the new PM, which has forced you to do this?

The change was rebased, which may produce a very noisy diff relative to
the previous version of the change.

Replaced the bool ContainsIrreducible with a new enum that is always
consistent, irrespective of whether we are using legacy DA or new DA.
This also allows a command-line option to force "divergent" control
flow, so that we can test changes on the default target for better
coverage.

sameerds added inline comments.Jul 30 2021, 1:51 AM

llvm/lib/Analysis/DivergenceAnalysis.cpp
357–360	Addressed with a new enum that always has a consistent value. It has other uses too, so it is checked first.
llvm/lib/Transforms/Scalar/Sink.cpp
247	The legacy DA was never ported to the new pass manager. It can only be invoked from the old pass manager, where it is the default DA. See the RUN lines in the updated lit test to see all the valid combinations.
llvm/test/Transforms/Sink/convergent.ll
6	The earlier attempt had copied the test into two versions, one with the default target and one with AMDGPU. But this is not scalable with more changes in the pipeline that have lots of other tests. Instead, the new option forces the new DA to report "divergent" for everything, which allows us to test convergent operations even on the default target. The tests can then run in every build, ensuring good coverage.

Harbormaster completed remote builds in B117123: Diff 362995.Jul 30 2021, 2:24 AM

arsenm added inline comments.Jul 30 2021, 5:23 PM

llvm/lib/Transforms/Scalar/Sink.cpp
58–64	This is reinterpreting the IR semantics based on target information. There is no defined link between divergence and convergent, and I'm not sure there should even be one. There are multiple target defined concepts of uniformity, and not all of them matter for convergent on AMDGPU. I think any kind of transform that wants to make this kind of assumption needs to be a specific divergence aware control flow optimizer

sameerds added inline comments.Aug 1 2021, 4:14 AM

llvm/lib/Transforms/Scalar/Sink.cpp
58–64	There is nothing target-specific about this change. The optimization is treating the convergent operation exactly the way it should be. And yes, there is a link between divergence and convergent operations. The current "definition" of convergent operations in the LangRef does not amount to much, and the actual implementation of it clearly reflects the link with divergence. The link between convergent operations and divergence is being formalized, here: https://reviews.llvm.org/D85603 and further made explicit here: https://reviews.llvm.org/D104504 There may be multiple concepts of uniformity, but that's an entirely separate enhancement. Independent of how many ways in which threads can diverge (wave, work-group, grid, whatever), convergent operations are all linked to the fact that they diverge. Every control flow transform in the optimizer should be aware of divergence. Divergence and convergent operations are two sides of the same coin, and the above linked reviews expose them everywhere in LLVM. There is no reason to sequester these concepts into a corner called "divergence aware optimizer". There needs to be no such thing.

Replaced the full-fledged divergence analysis with a trivial
DivergenceInfo object instead. The change also allows short-circuiting
the divergence analysis in lit tests, as demonstrated by changes to
LoopUnswitch.

Harbormaster completed remote builds in B118080: Diff 364354.Aug 4 2021, 11:15 PM

Bump!

In D106859#2927585, @sameerds wrote:

Replaced the full-fledged divergence analysis with a trivial
DivergenceInfo object instead. The change also allows short-circuiting
the divergence analysis in lit tests, as demonstrated by changes to
LoopUnswitch.

sameerds abandoned this revision.Jul 28 2022, 11:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 11:47 PM

Herald added a subscriber: kosarev. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

DivergenceAnalysis.h

42 lines

LegacyDivergenceAnalysis.h

7 lines

lib/

Analysis/

DivergenceAnalysis.cpp

97 lines

LegacyDivergenceAnalysis.cpp

58 lines

Transforms/

Scalar/

LoopUnswitch.cpp

5 lines

SimpleLoopUnswitch.cpp

7 lines

Sink.cpp

42 lines

test/

Analysis/

DivergenceAnalysis/

AMDGPU/

irreducible.ll

2 lines

Transforms/

LoopUnswitch/

AMDGPU/

divergent-unswitch.ll

divergent.ll

86 lines

Sink/

convergent.ll

27 lines

Diff 364354

llvm/include/llvm/Analysis/DivergenceAnalysis.h

Show All 19 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
class Module;		class Module;
class Value;		class Value;
class Instruction;		class Instruction;
		class LegacyDivergenceAnalysis;
class Loop;		class Loop;
class raw_ostream;		class raw_ostream;
class TargetTransformInfo;		class TargetTransformInfo;

/// \brief Generic divergence analysis for reducible CFGs.		/// \brief Generic divergence analysis for reducible CFGs.
///		///
/// This analysis propagates divergence in a data-parallel context from sources		/// This analysis propagates divergence in a data-parallel context from sources
/// of divergence to all users. It requires reducible CFGs. All assignments		/// of divergence to all users. It requires reducible CFGs. All assignments
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	private:
// Detected/marked divergent values.		// Detected/marked divergent values.
DenseSet<const Value *> DivergentValues;		DenseSet<const Value *> DivergentValues;

// Internal worklist for divergence propagation.		// Internal worklist for divergence propagation.
std::vector<const Instruction *> Worklist;		std::vector<const Instruction *> Worklist;
};		};

class DivergenceInfo {		class DivergenceInfo {
Function &F;

// If the function contains an irreducible region the divergence
// analysis can run indefinitely. We set ContainsIrreducible and no
// analysis is actually performed on the function. All values in
// this function are conservatively reported as divergent instead.
bool ContainsIrreducible;
std::unique_ptr<SyncDependenceAnalysis> SDA;
std::unique_ptr<DivergenceAnalysisImpl> DA;

public:		public:
		enum InfoKind { AlwaysUniform, AlwaysDivergent, Computed };

DivergenceInfo(Function &F, const DominatorTree &DT,		DivergenceInfo(Function &F, const DominatorTree &DT,
const PostDominatorTree &PDT, const LoopInfo &LI,		const PostDominatorTree &PDT, const LoopInfo &LI,
const TargetTransformInfo &TTI, bool KnownReducible);		const TargetTransformInfo &TTI, bool KnownReducible);

		DivergenceInfo(Function &F, LegacyDivergenceAnalysis *L);

		DivergenceInfo(Function &F, const TargetTransformInfo &TTI);

		bool isComputed() const { return Kind == Computed; }
		bool isAlwaysUniform() const { return Kind == AlwaysUniform; }
		bool isAlwaysDivergent() const { return Kind == AlwaysDivergent; }

		static bool assumeAlwaysDivergent();

/// Whether any divergence was detected.		/// Whether any divergence was detected.
bool hasDivergence() const {		bool hasDivergence() const;
return ContainsIrreducible \|\| DA->hasDetectedDivergence();
}

/// The GPU kernel this analysis result is for		/// The GPU kernel this analysis result is for
const Function &getFunction() const { return F; }		const Function &getFunction() const { return F; }

/// Whether \p V is divergent at its definition.		/// Whether \p V is divergent at its definition.
bool isDivergent(const Value &V) const {		bool isDivergent(const Value &V) const;
return ContainsIrreducible \|\| DA->isDivergent(V);
}

/// Whether \p U is divergent. Uses of a uniform value can be divergent.		/// Whether \p U is divergent. Uses of a uniform value can be divergent.
bool isDivergentUse(const Use &U) const {		bool isDivergentUse(const Use &U) const;
return ContainsIrreducible \|\| DA->isDivergentUse(U);
}

/// Whether \p V is uniform/non-divergent.		/// Whether \p V is uniform/non-divergent.
bool isUniform(const Value &V) const { return !isDivergent(V); }		bool isUniform(const Value &V) const { return !isDivergent(V); }

/// Whether \p U is uniform/non-divergent. Uses of a uniform value can be		/// Whether \p U is uniform/non-divergent. Uses of a uniform value can be
/// divergent.		/// divergent.
bool isUniformUse(const Use &U) const { return !isDivergentUse(U); }		bool isUniformUse(const Use &U) const { return !isDivergentUse(U); }

		private:
		Function &F;
		InfoKind Kind = Computed;
		const LegacyDivergenceAnalysis *LegacyDA = nullptr;
		std::unique_ptr<SyncDependenceAnalysis> SDA;
		std::unique_ptr<DivergenceAnalysisImpl> DA;
};		};

/// \brief Divergence analysis frontend for GPU kernels.		/// \brief Divergence analysis frontend for GPU kernels.
class DivergenceAnalysis : public AnalysisInfoMixin<DivergenceAnalysis> {		class DivergenceAnalysis : public AnalysisInfoMixin<DivergenceAnalysis> {
friend AnalysisInfoMixin<DivergenceAnalysis>;		friend AnalysisInfoMixin<DivergenceAnalysis>;

static AnalysisKey Key;		static AnalysisKey Key;

Show All 21 Lines

llvm/include/llvm/Analysis/LegacyDivergenceAnalysis.h

Show All 35 Lines	public:

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

// Print all divergent branches in the function.		// Print all divergent branches in the function.
void print(raw_ostream &OS, const Module *) const override;		void print(raw_ostream &OS, const Module *) const override;

		/// Whether any divergence was detected.
		bool hasDivergence() const;

// Returns true if V is divergent at its definition.		// Returns true if V is divergent at its definition.
bool isDivergent(const Value *V) const;		bool isDivergent(const Value *V) const;

// Returns true if U is divergent. Uses of a uniform value can be divergent.		// Returns true if U is divergent. Uses of a uniform value can be divergent.
bool isDivergentUse(const Use *U) const;		bool isDivergentUse(const Use *U) const;

// Returns true if V is uniform/non-divergent.		// Returns true if V is uniform/non-divergent.
bool isUniform(const Value *V) const { return !isDivergent(V); }		bool isUniform(const Value *V) const { return !isDivergent(V); }

// Returns true if U is uniform/non-divergent. Uses of a uniform value can be		// Returns true if U is uniform/non-divergent. Uses of a uniform value can be
// divergent.		// divergent.
bool isUniformUse(const Use *U) const { return !isDivergentUse(U); }		bool isUniformUse(const Use *U) const { return !isDivergentUse(U); }

// Keep the analysis results uptodate by removing an erased value.		// Keep the analysis results uptodate by removing an erased value.
void removeValue(const Value *V) { DivergentValues.erase(V); }		void removeValue(const Value *V) { DivergentValues.erase(V); }

private:		private:
// Whether analysis should be performed by GPUDivergenceAnalysis.		// Whether analysis should be performed by GPUDivergenceAnalysis.
bool shouldUseGPUDivergenceAnalysis(const Function &F,		bool shouldUseGPUDivergenceAnalysis(const Function &F,
const TargetTransformInfo &TTI) const;		const TargetTransformInfo &TTI) const;

// (optional) handle to new DivergenceAnalysis		// (optional) handle to new DivergenceAnalysis
std::unique_ptr<DivergenceInfo> gpuDA;		std::unique_ptr<DivergenceInfo> gpuDA;

		// Wrapper to present a uniform interface at points that are
		// independent of new/old pass manager.
		std::unique_ptr<DivergenceInfo> DIProxy;

// Stores all divergent values.		// Stores all divergent values.
DenseSet<const Value *> DivergentValues;		DenseSet<const Value *> DivergentValues;

// Stores divergent uses of possibly uniform values.		// Stores divergent uses of possibly uniform values.
DenseSet<const Use *> DivergentUses;		DenseSet<const Use *> DivergentUses;
};		};
} // End llvm namespace		} // End llvm namespace

#endif // LLVM_ANALYSIS_LEGACYDIVERGENCEANALYSIS_H		#endif // LLVM_ANALYSIS_LEGACYDIVERGENCEANALYSIS_H

llvm/lib/Analysis/DivergenceAnalysis.cpp

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	// generic or local address as divergent. This can be improved by leveraging			// generic or local address as divergent. This can be improved by leveraging
	// pointer analysis and/or by modelling non-escaping memory objects in SSA			// pointer analysis and/or by modelling non-escaping memory objects in SSA
	// as done in RV.			// as done in RV.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/Analysis/DivergenceAnalysis.h"			#include "llvm/Analysis/DivergenceAnalysis.h"
	#include "llvm/Analysis/CFG.h"			#include "llvm/Analysis/CFG.h"
				#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/Passes.h"			#include "llvm/Analysis/Passes.h"
	#include "llvm/Analysis/PostDominators.h"			#include "llvm/Analysis/PostDominators.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/InstIterator.h"			#include "llvm/IR/InstIterator.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/Value.h"			#include "llvm/IR/Value.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "divergence"			#define DEBUG_TYPE "divergence"

				static cl::opt<bool>
				AssumeDivergent("assume-always-divergent", cl::init(false), cl::Hidden,
				cl::ValueDisallowed,
				cl::desc("Assume that all control flow is divergent; "
				"mainly useful for testing"));

	DivergenceAnalysisImpl::DivergenceAnalysisImpl(			DivergenceAnalysisImpl::DivergenceAnalysisImpl(
	const Function &F, const Loop *RegionLoop, const DominatorTree &DT,			const Function &F, const Loop *RegionLoop, const DominatorTree &DT,
	const LoopInfo &LI, SyncDependenceAnalysis &SDA, bool IsLCSSAForm)			const LoopInfo &LI, SyncDependenceAnalysis &SDA, bool IsLCSSAForm)
	: F(F), RegionLoop(RegionLoop), DT(DT), LI(LI), SDA(SDA),			: F(F), RegionLoop(RegionLoop), DT(DT), LI(LI), SDA(SDA),
	IsLCSSAForm(IsLCSSAForm) {}			IsLCSSAForm(IsLCSSAForm) {}

	bool DivergenceAnalysisImpl::markDivergent(const Value &DivVal) {			bool DivergenceAnalysisImpl::markDivergent(const Value &DivVal) {
	if (isAlwaysUniform(DivVal))			if (isAlwaysUniform(DivVal))
	▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	}			}

	bool DivergenceAnalysisImpl::isDivergentUse(const Use &U) const {			bool DivergenceAnalysisImpl::isDivergentUse(const Use &U) const {
	Value &V = *U.get();			Value &V = *U.get();
	Instruction &I = *cast<Instruction>(U.getUser());			Instruction &I = *cast<Instruction>(U.getUser());
	return isDivergent(V) \|\| isTemporalDivergent(*I.getParent(), V);			return isDivergent(V) \|\| isTemporalDivergent(*I.getParent(), V);
	}			}

				bool DivergenceInfo::hasDivergence() const {
				if (Kind == AlwaysUniform)
				return false;
				if (Kind == AlwaysDivergent)
				return true;
				if (LegacyDA)
				return LegacyDA->hasDivergence();
				foadUnsubmitted Not Done Reply Inline Actions I think it would be less confusing to put the `!DA` check before the `ContainsIrreducible` check. I realise you have initialised ContainsIrreducible to false so that this still works, even for an "empty" analysis with no DA or LegacyDA, but that seems like a lie since we don't actually know whether the function constains irreducible regions or not. I would hope for a structure like: if (LegacyDA) return (something based on LegacyDA); if (DA) return (something based on DA); return false; Same for the other functions below. foad: I think it would be less confusing to put the `!DA` check before the `ContainsIrreducible`…
				sameerdsAuthorUnsubmitted Done Reply Inline Actions Addressed with a new enum that always has a consistent value. It has other uses too, so it is checked first. sameerds: Addressed with a new enum that always has a consistent value. It has other uses too, so it is…
				assert(DA);
				return DA->hasDetectedDivergence();
				}

				/// Whether \p V is divergent at its definition.
				bool DivergenceInfo::isDivergent(const Value &V) const {
				if (Kind == AlwaysUniform)
				return false;
				if (Kind == AlwaysDivergent)
				return true;
				if (LegacyDA)
				return LegacyDA->isDivergent(&V);
				assert(DA);
				return DA->isDivergent(V);
				}

				/// Whether \p U is divergent. Uses of a uniform value can be divergent.
				bool DivergenceInfo::isDivergentUse(const Use &U) const {
				if (Kind == AlwaysUniform)
				return false;
				if (Kind == AlwaysDivergent)
				return true;
				if (LegacyDA)
				return LegacyDA->isDivergentUse(&U);
				assert(DA);
				return DA->isDivergentUse(U);
				}

				bool DivergenceInfo::assumeAlwaysDivergent() { return AssumeDivergent; }

	DivergenceInfo::DivergenceInfo(Function &F, const DominatorTree &DT,			DivergenceInfo::DivergenceInfo(Function &F, const DominatorTree &DT,
	const PostDominatorTree &PDT, const LoopInfo &LI,			const PostDominatorTree &PDT, const LoopInfo &LI,
	const TargetTransformInfo &TTI,			const TargetTransformInfo &TTI,
	bool KnownReducible)			bool KnownReducible)
	: F(F), ContainsIrreducible(false) {			: F(F) {
				if (AssumeDivergent) {
				Kind = AlwaysDivergent;
				return;
				}

	if (!KnownReducible) {			if (!KnownReducible) {
	using RPOTraversal = ReversePostOrderTraversal<const Function *>;			using RPOTraversal = ReversePostOrderTraversal<const Function *>;
	RPOTraversal FuncRPOT(&F);			RPOTraversal FuncRPOT(&F);
	if (containsIrreducibleCFG<const BasicBlock *, const RPOTraversal,			KnownReducible =
	const LoopInfo>(FuncRPOT, LI)) {			!containsIrreducibleCFG<const BasicBlock *, const RPOTraversal,
	ContainsIrreducible = true;			const LoopInfo>(FuncRPOT, LI);
	return;
	}			}

				// If the function contains an irreducible region the divergence
				// analysis can run indefinitely. We set AlwaysDivergent and no
				// analysis is actually performed on the function. All values in
				// this function are conservatively reported as divergent instead.
				if (!KnownReducible) {
				Kind = AlwaysDivergent;
				return;
	}			}

	SDA = std::make_unique<SyncDependenceAnalysis>(DT, PDT, LI);			SDA = std::make_unique<SyncDependenceAnalysis>(DT, PDT, LI);
	DA = std::make_unique<DivergenceAnalysisImpl>(F, nullptr, DT, LI, *SDA,			DA = std::make_unique<DivergenceAnalysisImpl>(F, nullptr, DT, LI, *SDA,
	/* LCSSA */ false);			/* LCSSA */ false);
	for (auto &I : instructions(F)) {			for (auto &I : instructions(F)) {
	if (TTI.isSourceOfDivergence(&I)) {			if (TTI.isSourceOfDivergence(&I)) {
	DA->markDivergent(I);			DA->markDivergent(I);
	} else if (TTI.isAlwaysUniform(&I)) {			} else if (TTI.isAlwaysUniform(&I)) {
	DA->addUniformOverride(I);			DA->addUniformOverride(I);
	}			}
	}			}
	for (auto &Arg : F.args()) {			for (auto &Arg : F.args()) {
	if (TTI.isSourceOfDivergence(&Arg)) {			if (TTI.isSourceOfDivergence(&Arg)) {
	DA->markDivergent(Arg);			DA->markDivergent(Arg);
	}			}
	}			}

	DA->compute();			DA->compute();
	}			}

				DivergenceInfo::DivergenceInfo(Function &F, const TargetTransformInfo &TTI)
				: F(F) {
				Kind = AlwaysUniform;
				if (AssumeDivergent \|\| TTI.hasBranchDivergence()) {
				Kind = AlwaysDivergent;
				}
				}

				DivergenceInfo::DivergenceInfo(Function &F, LegacyDivergenceAnalysis *L)
				: F(F), LegacyDA(L) {
				assert(Kind == Computed);
				}

	AnalysisKey DivergenceAnalysis::Key;			AnalysisKey DivergenceAnalysis::Key;

	DivergenceAnalysis::Result			DivergenceAnalysis::Result
	DivergenceAnalysis::run(Function &F, FunctionAnalysisManager &AM) {			DivergenceAnalysis::run(Function &F, FunctionAnalysisManager &AM) {
				auto &TTI = AM.getResult<TargetIRAnalysis>(F);

				{
				// Trivially return an empty analysis if the target does not have
				// divergence.
				DivergenceInfo DI{F, TTI};
				if (AssumeDivergent \|\| DI.isAlwaysUniform())
				return DI;

				// DI is now set to AlwaysDivergent at this point, but we are
				// about to compute divergence for real.
				assert(DI.isAlwaysDivergent());
				}

	auto &DT = AM.getResult<DominatorTreeAnalysis>(F);			auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
	auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);			auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
	auto &LI = AM.getResult<LoopAnalysis>(F);			auto &LI = AM.getResult<LoopAnalysis>(F);
	auto &TTI = AM.getResult<TargetIRAnalysis>(F);

	return DivergenceInfo(F, DT, PDT, LI, TTI, /* KnownReducible = */ false);			return DivergenceInfo(F, DT, PDT, LI, TTI, /* KnownReducible = */ false);
	}			}

	PreservedAnalyses			PreservedAnalyses
	DivergenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {			DivergenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {
	auto &DI = FAM.getResult<DivergenceAnalysis>(F);			auto &DI = FAM.getResult<DivergenceAnalysis>(F);
	OS << "'Divergence Analysis' for function '" << F.getName() << "':\n";			OS << "'Divergence Analysis' for function '" << F.getName() << "':\n";
	Show All 15 Lines

llvm/lib/Analysis/LegacyDivergenceAnalysis.cpp

	Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	FunctionPass *llvm::createLegacyDivergenceAnalysisPass() {			FunctionPass *llvm::createLegacyDivergenceAnalysisPass() {
	return new LegacyDivergenceAnalysis();			return new LegacyDivergenceAnalysis();
	}			}

	void LegacyDivergenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {			void LegacyDivergenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {
	AU.addRequiredTransitive<DominatorTreeWrapperPass>();			AU.addRequiredTransitive<DominatorTreeWrapperPass>();
	AU.addRequiredTransitive<PostDominatorTreeWrapperPass>();			AU.addRequiredTransitive<PostDominatorTreeWrapperPass>();
	AU.addRequiredTransitive<LoopInfoWrapperPass>();			AU.addRequiredTransitive<LoopInfoWrapperPass>();
				AU.addRequiredTransitive<TargetTransformInfoWrapperPass>();
	AU.setPreservesAll();			AU.setPreservesAll();
	}			}

	bool LegacyDivergenceAnalysis::shouldUseGPUDivergenceAnalysis(			bool LegacyDivergenceAnalysis::shouldUseGPUDivergenceAnalysis(
	const Function &F, const TargetTransformInfo &TTI) const {			const Function &F, const TargetTransformInfo &TTI) const {
	if (!(UseGPUDA \|\| TTI.useGPUDivergenceAnalysis()))			if (!(UseGPUDA \|\| TTI.useGPUDivergenceAnalysis()))
	return false;			return false;

	// GPUDivergenceAnalysis requires a reducible CFG.			// GPUDivergenceAnalysis requires a reducible CFG.
	auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();			auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
	using RPOTraversal = ReversePostOrderTraversal<const Function *>;			using RPOTraversal = ReversePostOrderTraversal<const Function *>;
	RPOTraversal FuncRPOT(&F);			RPOTraversal FuncRPOT(&F);
	return !containsIrreducibleCFG<const BasicBlock *, const RPOTraversal,			return !containsIrreducibleCFG<const BasicBlock *, const RPOTraversal,
	const LoopInfo>(FuncRPOT, LI);			const LoopInfo>(FuncRPOT, LI);
	}			}

	bool LegacyDivergenceAnalysis::runOnFunction(Function &F) {			bool LegacyDivergenceAnalysis::runOnFunction(Function &F) {
	auto *TTIWP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();
	if (TTIWP == nullptr)
	return false;

	TargetTransformInfo &TTI = TTIWP->getTTI(F);
	// Fast path: if the target does not have branch divergence, we do not mark			// Fast path: if the target does not have branch divergence, we do not mark
	// any branch as divergent.			// any branch as divergent.
	if (!TTI.hasBranchDivergence())			auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				DIProxy = std::make_unique<DivergenceInfo>(F, TTI);
				if (DivergenceInfo::assumeAlwaysDivergent() \|\| DIProxy->isAlwaysUniform())
	return false;			return false;

				// DIProxy is now set to AlwaysDivergent at this point, but we are
				// about to compute divergence for real. So we delete the proxy; it
				// will be recreated later, but only if the new divergence analysis
				// is not in use.
				assert(DIProxy->isAlwaysDivergent());
				DIProxy.reset();

	DivergentValues.clear();			DivergentValues.clear();
	DivergentUses.clear();			DivergentUses.clear();
	gpuDA = nullptr;			gpuDA = nullptr;

	auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();			auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
	auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();			auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();

	if (shouldUseGPUDivergenceAnalysis(F, TTI)) {			if (shouldUseGPUDivergenceAnalysis(F, TTI)) {
	// run the new GPU divergence analysis			// run the new GPU divergence analysis
	auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();			auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
	gpuDA = std::make_unique<DivergenceInfo>(F, DT, PDT, LI, TTI,			gpuDA = std::make_unique<DivergenceInfo>(F, DT, PDT, LI, TTI,
	/* KnownReducible = */ true);			/* KnownReducible = */ true);

	} else {			} else {
	// run LLVM's existing DivergenceAnalysis			// run LLVM's existing DivergenceAnalysis
	DivergencePropagator DP(F, TTI, DT, PDT, DivergentValues, DivergentUses);			DivergencePropagator DP(F, TTI, DT, PDT, DivergentValues, DivergentUses);
	DP.populateWithSourcesOfDivergence();			DP.populateWithSourcesOfDivergence();
	DP.propagate();			DP.propagate();
				DIProxy = std::make_unique<DivergenceInfo>(F, this);
	}			}

	LLVM_DEBUG(dbgs() << "\nAfter divergence analysis on " << F.getName()			LLVM_DEBUG(dbgs() << "\nAfter divergence analysis on " << F.getName()
	<< ":\n";			<< ":\n";
	print(dbgs(), F.getParent()));			print(dbgs(), F.getParent()));

	return false;			return false;
	}			}

				bool LegacyDivergenceAnalysis::hasDivergence() const {
				if (gpuDA) {
				return gpuDA->hasDivergence();
				}
				if (DIProxy->isAlwaysDivergent())
				return true;
				if (DIProxy->isAlwaysUniform())
				return false;
				return !DivergentValues.empty();
				}

	bool LegacyDivergenceAnalysis::isDivergent(const Value *V) const {			bool LegacyDivergenceAnalysis::isDivergent(const Value *V) const {
	if (gpuDA) {			if (gpuDA) {
	return gpuDA->isDivergent(*V);			return gpuDA->isDivergent(*V);
	}			}
				if (DIProxy->isAlwaysDivergent())
				return true;
				if (DIProxy->isAlwaysUniform())
				return false;
	return DivergentValues.count(V);			return DivergentValues.count(V);
	}			}

	bool LegacyDivergenceAnalysis::isDivergentUse(const Use *U) const {			bool LegacyDivergenceAnalysis::isDivergentUse(const Use *U) const {
	if (gpuDA) {			if (gpuDA) {
	return gpuDA->isDivergentUse(*U);			return gpuDA->isDivergentUse(*U);
	}			}
				if (DIProxy->isAlwaysDivergent())
				return true;
				if (DIProxy->isAlwaysUniform())
				return false;
	return DivergentValues.count(U->get()) \|\| DivergentUses.count(U);			return DivergentValues.count(U->get()) \|\| DivergentUses.count(U);
	}			}

	void LegacyDivergenceAnalysis::print(raw_ostream &OS, const Module *) const {			void LegacyDivergenceAnalysis::print(raw_ostream &OS, const Module *) const {
	if ((!gpuDA \|\| !gpuDA->hasDivergence()) && DivergentValues.empty())			if (!hasDivergence())
	return;			return;

	const Function *F = nullptr;			const Function *F = nullptr;
	if (!DivergentValues.empty()) {			if (DIProxy)
	const Value FirstDivergentValue = DivergentValues.begin();			F = &DIProxy->getFunction();
	if (const Argument *Arg = dyn_cast<Argument>(FirstDivergentValue)) {			else {
	F = Arg->getParent();			assert(gpuDA);
	} else if (const Instruction *I =
	dyn_cast<Instruction>(FirstDivergentValue)) {
	F = I->getParent()->getParent();
	} else {
	llvm_unreachable("Only arguments and instructions can be divergent");
	}
	} else if (gpuDA) {
	F = &gpuDA->getFunction();			F = &gpuDA->getFunction();
	}			}
	if (!F)			assert(F);
	return;

	// Dumps all divergent values in F, arguments and then instructions.			// Dumps all divergent values in F, arguments and then instructions.
	for (auto &Arg : F->args()) {			for (auto &Arg : F->args()) {
	OS << (isDivergent(&Arg) ? "DIVERGENT: " : " ");			OS << (isDivergent(&Arg) ? "DIVERGENT: " : " ");
	OS << Arg << "\n";			OS << Arg << "\n";
	}			}
	// Iterate instructions using instructions() to ensure a deterministic order.			// Iterate instructions using instructions() to ensure a deterministic order.
	for (const BasicBlock &BB : *F) {			for (const BasicBlock &BB : *F) {
	OS << "\n " << BB.getName() << ":\n";			OS << "\n " << BB.getName() << ":\n";
	for (auto &I : BB.instructionsWithoutDebug()) {			for (auto &I : BB.instructionsWithoutDebug()) {
	OS << (isDivergent(&I) ? "DIVERGENT: " : " ");			OS << (isDivergent(&I) ? "DIVERGENT: " : " ");
	OS << I << "\n";			OS << I << "\n";
	}			}
	}			}
	OS << "\n";			OS << "\n";
	}			}

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp

Show All 26 Lines

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyBlockFrequencyInfo.h"		#include "llvm/Analysis/LazyBlockFrequencyInfo.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	class LoopUnswitch : public LoopPass {

bool HasBranchDivergence;		bool HasBranchDivergence;

public:		public:
static char ID; // Pass ID, replacement for typeid		static char ID; // Pass ID, replacement for typeid

explicit LoopUnswitch(bool Os = false, bool HasBranchDivergence = false)		explicit LoopUnswitch(bool Os = false, bool HasBranchDivergence = false)
: LoopPass(ID), OptimizeForSize(Os),		: LoopPass(ID), OptimizeForSize(Os),
HasBranchDivergence(HasBranchDivergence) {		HasBranchDivergence(DivergenceInfo::assumeAlwaysDivergent()
		? true
		: HasBranchDivergence) {
initializeLoopUnswitchPass(*PassRegistry::getPassRegistry());		initializeLoopUnswitchPass(*PassRegistry::getPassRegistry());
}		}

bool runOnLoop(Loop *L, LPPassManager &LPM) override;		bool runOnLoop(Loop *L, LPPassManager &LPM) override;
bool processCurrentLoop();		bool processCurrentLoop();
bool isUnreachableDueToPreviousUnswitching(BasicBlock *);		bool isUnreachableDueToPreviousUnswitching(BasicBlock *);

/// This transformation requires natural loop information & requires that		/// This transformation requires natural loop information & requires that
▲ Show 20 Lines • Show All 1,555 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

Show All 12 Lines
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/GuardUtils.h"		#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopAnalysisManager.h"		#include "llvm/Analysis/LoopAnalysisManager.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
▲ Show 20 Lines • Show All 2,981 Lines • ▼ Show 20 Lines	unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
// NonTrivial: Parameter that enables non-trivial unswitching for this		// NonTrivial: Parameter that enables non-trivial unswitching for this
// invocation of the transform. But this should be allowed only		// invocation of the transform. But this should be allowed only
// for targets without branch divergence.		// for targets without branch divergence.
//		//
// FIXME: If divergence analysis becomes available to a loop		// FIXME: If divergence analysis becomes available to a loop
// transform, we should allow unswitching for non-trivial uniform		// transform, we should allow unswitching for non-trivial uniform
// branches even on targets that have divergence.		// branches even on targets that have divergence.
// https://bugs.llvm.org/show_bug.cgi?id=48819		// https://bugs.llvm.org/show_bug.cgi?id=48819
		//
		// For now, we use a trivial DivergenceInfo object to check whether
		// divergence exists.
		const DivergenceInfo DI{*L.getHeader()->getParent(), TTI};
bool ContinueWithNonTrivial =		bool ContinueWithNonTrivial =
EnableNonTrivialUnswitch \|\| (NonTrivial && !TTI.hasBranchDivergence());		EnableNonTrivialUnswitch \|\| (NonTrivial && !DI.hasDivergence());
if (!ContinueWithNonTrivial)		if (!ContinueWithNonTrivial)
return false;		return false;

// Skip non-trivial unswitching for optsize functions.		// Skip non-trivial unswitching for optsize functions.
if (L.getHeader()->getParent()->hasOptSize())		if (L.getHeader()->getParent()->hasOptSize())
return false;		return false;

// Skip non-trivial unswitching for loops that cannot be cloned.		// Skip non-trivial unswitching for loops that cannot be cloned.
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/Sink.cpp

//===-- Sink.cpp - Code Sinking -------------------------------------------===//		//===-- Sink.cpp - Code Sinking -------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This pass moves instructions into successor blocks, when possible, so that		// This pass moves instructions into successor blocks, when possible, so that
// they aren't executed on paths where their results aren't needed.		// they aren't executed on paths where their results aren't needed.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/Sink.h"		#include "llvm/Transforms/Scalar/Sink.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "sink"		#define DEBUG_TYPE "sink"

STATISTIC(NumSunk, "Number of instructions sunk");		STATISTIC(NumSunk, "Number of instructions sunk");
STATISTIC(NumSinkIter, "Number of sinking iterations");		STATISTIC(NumSinkIter, "Number of sinking iterations");

static bool isSafeToMove(Instruction *Inst, AliasAnalysis &AA,		static bool isSafeToMove(Instruction *Inst, AliasAnalysis &AA,
		const DivergenceInfo &DI,
SmallPtrSetImpl<Instruction *> &Stores) {		SmallPtrSetImpl<Instruction *> &Stores) {

if (Inst->mayWriteToMemory()) {		if (Inst->mayWriteToMemory()) {
Stores.insert(Inst);		Stores.insert(Inst);
return false;		return false;
}		}

if (LoadInst *L = dyn_cast<LoadInst>(Inst)) {		if (LoadInst *L = dyn_cast<LoadInst>(Inst)) {
MemoryLocation Loc = MemoryLocation::get(L);		MemoryLocation Loc = MemoryLocation::get(L);
for (Instruction *S : Stores)		for (Instruction *S : Stores)
if (isModSet(AA.getModRefInfo(S, Loc)))		if (isModSet(AA.getModRefInfo(S, Loc)))
return false;		return false;
}		}

if (Inst->isTerminator() \|\| isa<PHINode>(Inst) \|\| Inst->isEHPad() \|\|		if (Inst->isTerminator() \|\| isa<PHINode>(Inst) \|\| Inst->isEHPad() \|\|
Inst->mayThrow())		Inst->mayThrow())
return false;		return false;

if (auto *Call = dyn_cast<CallBase>(Inst)) {		if (auto *Call = dyn_cast<CallBase>(Inst)) {
// Convergent operations cannot be made control-dependent on additional		// Convergent operations cannot be sunk across divergent branches.
// values.		if (Call->isConvergent()) {
if (Call->isConvergent())		if (DI.hasDivergence())
return false;		return false;
		}

		arsenmUnsubmitted Not Done Reply Inline Actions This is reinterpreting the IR semantics based on target information. There is no defined link between divergence and convergent, and I'm not sure there should even be one. There are multiple target defined concepts of uniformity, and not all of them matter for convergent on AMDGPU. I think any kind of transform that wants to make this kind of assumption needs to be a specific divergence aware control flow optimizer arsenm: This is reinterpreting the IR semantics based on target information. There is no defined link…
		sameerdsAuthorUnsubmitted Done Reply Inline Actions There is nothing target-specific about this change. The optimization is treating the convergent operation exactly the way it should be. And yes, there is a link between divergence and convergent operations. The current "definition" of convergent operations in the LangRef does not amount to much, and the actual implementation of it clearly reflects the link with divergence. The link between convergent operations and divergence is being formalized, here: https://reviews.llvm.org/D85603 and further made explicit here: https://reviews.llvm.org/D104504 There may be multiple concepts of uniformity, but that's an entirely separate enhancement. Independent of how many ways in which threads can diverge (wave, work-group, grid, whatever), convergent operations are all linked to the fact that they diverge. Every control flow transform in the optimizer should be aware of divergence. Divergence and convergent operations are two sides of the same coin, and the above linked reviews expose them everywhere in LLVM. There is no reason to sequester these concepts into a corner called "divergence aware optimizer". There needs to be no such thing. sameerds: There is nothing target-specific about this change. The optimization is treating the convergent…
for (Instruction *S : Stores)		for (Instruction *S : Stores)
if (isModSet(AA.getModRefInfo(S, Call)))		if (isModSet(AA.getModRefInfo(S, Call)))
return false;		return false;
}		}

return true;		return true;
}		}

Show All 33 Lines	static bool IsAcceptableTarget(Instruction Inst, BasicBlock SuccToSinkTo,

return true;		return true;
}		}

/// SinkInstruction - Determine whether it is safe to sink the specified machine		/// SinkInstruction - Determine whether it is safe to sink the specified machine
/// instruction out of its current block into a successor.		/// instruction out of its current block into a successor.
static bool SinkInstruction(Instruction *Inst,		static bool SinkInstruction(Instruction *Inst,
SmallPtrSetImpl<Instruction *> &Stores,		SmallPtrSetImpl<Instruction *> &Stores,
DominatorTree &DT, LoopInfo &LI, AAResults &AA) {		DominatorTree &DT, LoopInfo &LI, AAResults &AA,
		const DivergenceInfo &DI) {

// Don't sink static alloca instructions. CodeGen assumes allocas outside the		// Don't sink static alloca instructions. CodeGen assumes allocas outside the
// entry block are dynamically sized stack objects.		// entry block are dynamically sized stack objects.
if (AllocaInst *AI = dyn_cast<AllocaInst>(Inst))		if (AllocaInst *AI = dyn_cast<AllocaInst>(Inst))
if (AI->isStaticAlloca())		if (AI->isStaticAlloca())
return false;		return false;

// Check if it's safe to move the instruction.		// Check if it's safe to move the instruction.
if (!isSafeToMove(Inst, AA, Stores))		if (!isSafeToMove(Inst, AA, DI, Stores))
return false;		return false;

// FIXME: This should include support for sinking instructions within the		// FIXME: This should include support for sinking instructions within the
// block they are currently in to shorten the live ranges. We often get		// block they are currently in to shorten the live ranges. We often get
// instructions sunk into the top of a large block, but it would be better to		// instructions sunk into the top of a large block, but it would be better to
// also sink them down before their first use in the block. This xform has to		// also sink them down before their first use in the block. This xform has to
// be careful not to increase register pressure though, e.g. sinking		// be careful not to increase register pressure though, e.g. sinking
// "x = y + z" down if it kills y and z would increase the live ranges of y		// "x = y + z" down if it kills y and z would increase the live ranges of y
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Sink" << *Inst << " (";
SuccToSinkTo->printAsOperand(dbgs(), false); dbgs() << ")\n");		SuccToSinkTo->printAsOperand(dbgs(), false); dbgs() << ")\n");

// Move the instruction.		// Move the instruction.
Inst->moveBefore(&*SuccToSinkTo->getFirstInsertionPt());		Inst->moveBefore(&*SuccToSinkTo->getFirstInsertionPt());
return true;		return true;
}		}

static bool ProcessBlock(BasicBlock &BB, DominatorTree &DT, LoopInfo &LI,		static bool ProcessBlock(BasicBlock &BB, DominatorTree &DT, LoopInfo &LI,
AAResults &AA) {		AAResults &AA, const DivergenceInfo &DI) {
// Can't sink anything out of a block that has less than two successors.		// Can't sink anything out of a block that has less than two successors.
if (BB.getTerminator()->getNumSuccessors() <= 1) return false;		if (BB.getTerminator()->getNumSuccessors() <= 1) return false;

// Don't bother sinking code out of unreachable blocks. In addition to being		// Don't bother sinking code out of unreachable blocks. In addition to being
// unprofitable, it can also lead to infinite looping, because in an		// unprofitable, it can also lead to infinite looping, because in an
// unreachable loop there may be nowhere to stop.		// unreachable loop there may be nowhere to stop.
if (!DT.isReachableFromEntry(&BB)) return false;		if (!DT.isReachableFromEntry(&BB)) return false;

Show All 11 Lines	do {
// sinking.		// sinking.
ProcessedBegin = I == BB.begin();		ProcessedBegin = I == BB.begin();
if (!ProcessedBegin)		if (!ProcessedBegin)
--I;		--I;

if (Inst->isDebugOrPseudoInst())		if (Inst->isDebugOrPseudoInst())
continue;		continue;

if (SinkInstruction(Inst, Stores, DT, LI, AA)) {		if (SinkInstruction(Inst, Stores, DT, LI, AA, DI)) {
++NumSunk;		++NumSunk;
MadeChange = true;		MadeChange = true;
}		}

// If we just processed the first instruction in the block, we're done.		// If we just processed the first instruction in the block, we're done.
} while (!ProcessedBegin);		} while (!ProcessedBegin);

return MadeChange;		return MadeChange;
}		}

static bool iterativelySinkInstructions(Function &F, DominatorTree &DT,		static bool iterativelySinkInstructions(Function &F, DominatorTree &DT,
LoopInfo &LI, AAResults &AA) {		LoopInfo &LI, AAResults &AA,
		const DivergenceInfo &DI) {
bool MadeChange, EverMadeChange = false;		bool MadeChange, EverMadeChange = false;

do {		do {
MadeChange = false;		MadeChange = false;
LLVM_DEBUG(dbgs() << "Sinking iteration " << NumSinkIter << "\n");		LLVM_DEBUG(dbgs() << "Sinking iteration " << NumSinkIter << "\n");
// Process all basic blocks.		// Process all basic blocks.
for (BasicBlock &I : F)		for (BasicBlock &I : F)
MadeChange \|= ProcessBlock(I, DT, LI, AA);		MadeChange \|= ProcessBlock(I, DT, LI, AA, DI);
EverMadeChange \|= MadeChange;		EverMadeChange \|= MadeChange;
NumSinkIter++;		NumSinkIter++;
} while (MadeChange);		} while (MadeChange);

return EverMadeChange;		return EverMadeChange;
}		}

PreservedAnalyses SinkingPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses SinkingPass::run(Function &F, FunctionAnalysisManager &AM) {
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &LI = AM.getResult<LoopAnalysis>(F);		auto &LI = AM.getResult<LoopAnalysis>(F);
auto &AA = AM.getResult<AAManager>(F);		auto &AA = AM.getResult<AAManager>(F);
		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
		foadUnsubmitted Not Done Reply Inline Actions It seems odd that you explicitly choose the new DA here but the legacy DA in SinkingLegacyPass. I know they both have "legacy" in their name but that doesn't seem like a good reason. SinkingLegacyPass is just a wrapper for use with the legacy pass manager, but I don't see why it should necessarily use the legacy DA. Or is there some problem like the legacy DA doesn't work with the new PM, which has forced you to do this? foad: It seems odd that you explicitly choose the new DA here but the legacy DA in SinkingLegacyPass.
		sameerdsAuthorUnsubmitted Done Reply Inline Actions The legacy DA was never ported to the new pass manager. It can only be invoked from the old pass manager, where it is the default DA. See the RUN lines in the updated lit test to see all the valid combinations. sameerds: The legacy DA was never ported to the new pass manager. It can only be invoked from the old…

		// Sinking only checks whether divergence potentially exists in the
		// function. For this, we rely on a trivial DivergenceInfo object.
		const DivergenceInfo DI{F, TTI};

if (!iterativelySinkInstructions(F, DT, LI, AA))		if (!iterativelySinkInstructions(F, DT, LI, AA, DI))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
return PA;		return PA;
}		}

namespace {		namespace {
class SinkingLegacyPass : public FunctionPass {		class SinkingLegacyPass : public FunctionPass {
public:		public:
static char ID; // Pass identification		static char ID; // Pass identification
SinkingLegacyPass() : FunctionPass(ID) {		SinkingLegacyPass() : FunctionPass(ID) {
initializeSinkingLegacyPassPass(*PassRegistry::getPassRegistry());		initializeSinkingLegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();		auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

		// Sinking only checks whether divergence potentially exists in the
		// function. For this, we rely on a trivial DivergenceInfo object.
		const DivergenceInfo DI{F, TTI};

return iterativelySinkInstructions(F, DT, LI, AA);		return iterativelySinkInstructions(F, DT, LI, AA, DI);
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
FunctionPass::getAnalysisUsage(AU);		FunctionPass::getAnalysisUsage(AU);
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<LoopInfoWrapperPass>();		AU.addPreserved<LoopInfoWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

char SinkingLegacyPass::ID = 0;		char SinkingLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(SinkingLegacyPass, "sink", "Code sinking", false, false)		INITIALIZE_PASS_BEGIN(SinkingLegacyPass, "sink", "Code sinking", false, false)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_END(SinkingLegacyPass, "sink", "Code sinking", false, false)		INITIALIZE_PASS_END(SinkingLegacyPass, "sink", "Code sinking", false, false)

FunctionPass *llvm::createSinkingPass() { return new SinkingLegacyPass(); }		FunctionPass *llvm::createSinkingPass() { return new SinkingLegacyPass(); }

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/irreducible.ll

	; RUN: opt %s -mtriple amdgcn-- -enable-new-pm=0 -analyze -divergence -use-gpu-divergence-analysis \| FileCheck %s			; RUN: opt %s -mtriple amdgcn-- -enable-new-pm=0 -analyze -divergence -use-gpu-divergence-analysis \| FileCheck %s
	; RUN: opt %s -mtriple amdgcn-- -passes='print<divergence>' -disable-output 2>&1 \| FileCheck %s			; RUN: opt %s -mtriple amdgcn-- -passes='print<divergence>' -disable-output 2>&1 \| FileCheck %s

	; NOTE: The new pass manager does not fall back on legacy divergence			; NOTE: The new pass manager does not fall back on legacy divergence
	; analysis even when the function contains an irreducible loop. The			; analysis even when the function contains an irreducible loop. The
	; (new) divergence analysis conservatively reports all values as			; (new) divergence analysis conservatively reports all values as
	; divergent. This test does not check for this conservative			; divergent. This test does not check for this conservative
	; behaviour. Instead, it only checks for the values that are known to			; behaviour. Instead, it only checks for the values that are known to
	; be divergent according to the legacy analysis.			; be divergent according to the legacy analysis.

	; RUN: opt -mtriple amdgcn-- -passes='print<divergence>' -disable-output %s 2>&1 \| FileCheck %s

	; This test contains an unstructured loop.			; This test contains an unstructured loop.
	; +-------------- entry ----------------+			; +-------------- entry ----------------+
	; \| \|			; \| \|
	; V V			; V V
	; i1 = phi(0, i3) i2 = phi(0, i3)			; i1 = phi(0, i3) i2 = phi(0, i3)
	; j1 = i1 + 1 ---> i3 = phi(j1, j2) <--- j2 = i2 + 2			; j1 = i1 + 1 ---> i3 = phi(j1, j2) <--- j2 = i2 + 2
	; ^ \| ^			; ^ \| ^
	; \| V \|			; \| V \|
	Show All 38 Lines

llvm/test/Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll

This file was deleted.

	; RUN: opt -mtriple=amdgcn-- -O3 -S %s \| FileCheck %s

	; Check that loop unswitch does not happen if condition is divergent.

	; CHECK-LABEL: {{^}}define amdgpu_kernel void @divergent_unswitch
	; CHECK: entry:
	; CHECK: icmp
	; CHECK: [[IF_COND:%[a-z0-9]+]] = icmp {{.*}} 567890
	; CHECK: br label
	; CHECK: br i1 [[IF_COND]]

	define amdgpu_kernel void @divergent_unswitch(i32 * nocapture %out, i32 %n) {
	entry:
	%cmp9 = icmp sgt i32 %n, 0
	br i1 %cmp9, label %for.body.lr.ph, label %for.cond.cleanup

	for.body.lr.ph: ; preds = %entry
	%call = tail call i32 @llvm.amdgcn.workitem.id.x() #0
	%cmp2 = icmp eq i32 %call, 567890
	br label %for.body

	for.cond.cleanup.loopexit: ; preds = %for.inc
	br label %for.cond.cleanup

	for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
	ret void

	for.body: ; preds = %for.inc, %for.body.lr.ph
	%i.010 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
	br i1 %cmp2, label %if.then, label %for.inc

	if.then: ; preds = %for.body
	%arrayidx = getelementptr inbounds i32, i32 * %out, i32 %i.010
	store i32 %i.010, i32 * %arrayidx, align 4
	br label %for.inc

	for.inc: ; preds = %for.body, %if.then
	%inc = add nuw nsw i32 %i.010, 1
	%exitcond = icmp eq i32 %inc, %n
	br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
	}

	declare i32 @llvm.amdgcn.workitem.id.x() #0

	attributes #0 = { nounwind readnone }

llvm/test/Transforms/LoopUnswitch/divergent.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -O3 -S -enable-new-pm=0 \| FileCheck %s --check-prefixes=UNI
				; RUN: opt < %s -O3 -S -enable-new-pm=0 -assume-always-divergent \| FileCheck %s --check-prefixes=DIV

				; RUN: opt < %s -O3 -S -enable-new-pm=1 \| FileCheck %s --check-prefixes=UNI
				; RUN: opt < %s -O3 -S -enable-new-pm=1 -assume-always-divergent \| FileCheck %s --check-prefixes=DIV

				; Check that loop unswitch does not happen if condition is divergent.

				define void @divergent_unswitch(i32 * nocapture %out, i32 %n) {
				; UNI-LABEL: @divergent_unswitch(
				; UNI-NEXT: entry:
				; UNI-NEXT: [[CMP9:%.]] = icmp sgt i32 [[N:%.]], 0
				; UNI-NEXT: br i1 [[CMP9]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_COND_CLEANUP:%.]]
				; UNI: for.body.lr.ph:
				; UNI-NEXT: [[CALL:%.*]] = tail call i32 @extern_func() #[[ATTR2:[0-9]+]]
				; UNI-NEXT: [[CMP2:%.*]] = icmp eq i32 [[CALL]], 567890
				; UNI-NEXT: br i1 [[CMP2]], label [[FOR_BODY_US:%.*]], label [[FOR_COND_CLEANUP]]
				; UNI: for.body.us:
				; UNI-NEXT: [[I_010_US:%.]] = phi i32 [ [[INC_US:%.]], [[FOR_BODY_US]] ], [ 0, [[FOR_BODY_LR_PH]] ]
				; UNI-NEXT: [[TMP0:%.*]] = zext i32 [[I_010_US]] to i64
				; UNI-NEXT: [[ARRAYIDX_US:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 [[TMP0]]
				; UNI-NEXT: store i32 [[I_010_US]], i32* [[ARRAYIDX_US]], align 4
				; UNI-NEXT: [[INC_US]] = add nuw nsw i32 [[I_010_US]], 1
				; UNI-NEXT: [[EXITCOND_US:%.*]] = icmp eq i32 [[INC_US]], [[N]]
				; UNI-NEXT: br i1 [[EXITCOND_US]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_US]]
				; UNI: for.cond.cleanup:
				; UNI-NEXT: ret void
				;
				; DIV-LABEL: @divergent_unswitch(
				; DIV-NEXT: entry:
				; DIV-NEXT: [[CMP9:%.]] = icmp sgt i32 [[N:%.]], 0
				; DIV-NEXT: br i1 [[CMP9]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_COND_CLEANUP:%.]]
				; DIV: for.body.lr.ph:
				; DIV-NEXT: [[CALL:%.*]] = tail call i32 @extern_func() #[[ATTR2:[0-9]+]]
				; DIV-NEXT: [[CMP2:%.*]] = icmp eq i32 [[CALL]], 567890
				; DIV-NEXT: br label [[FOR_BODY:%.*]]
				; DIV: for.body:
				; DIV-NEXT: [[I_010:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_INC:%.*]] ]
				; DIV-NEXT: br i1 [[CMP2]], label [[IF_THEN:%.*]], label [[FOR_INC]]
				; DIV: if.then:
				; DIV-NEXT: [[TMP0:%.*]] = zext i32 [[I_010]] to i64
				; DIV-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 [[TMP0]]
				; DIV-NEXT: store i32 [[I_010]], i32* [[ARRAYIDX]], align 4
				; DIV-NEXT: br label [[FOR_INC]]
				; DIV: for.inc:
				; DIV-NEXT: [[INC]] = add nuw nsw i32 [[I_010]], 1
				; DIV-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
				; DIV-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
				; DIV: for.cond.cleanup:
				; DIV-NEXT: ret void
				;
				entry:
				%cmp9 = icmp sgt i32 %n, 0
				br i1 %cmp9, label %for.body.lr.ph, label %for.cond.cleanup

				for.body.lr.ph: ; preds = %entry
				%call = tail call i32 @extern_func() #0
				%cmp2 = icmp eq i32 %call, 567890
				br label %for.body

				for.body: ; preds = %for.inc, %for.body.lr.ph
				%i.010 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
				br i1 %cmp2, label %if.then, label %for.inc

				if.then: ; preds = %for.body
				%arrayidx = getelementptr inbounds i32, i32 * %out, i32 %i.010
				store i32 %i.010, i32 * %arrayidx, align 4
				br label %for.inc

				for.inc: ; preds = %for.body, %if.then
				%inc = add nuw nsw i32 %i.010, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.inc
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void

				}

				declare i32 @extern_func() #0

				attributes #0 = { nounwind readnone }

llvm/test/Transforms/Sink/convergent.ll

	; RUN: opt -sink -S < %s \| FileCheck %s			; RUN: opt -enable-new-pm=1 -sink -S < %s \| FileCheck %s -check-prefixes=CHECK,UNI
				; RUN: opt -enable-new-pm=0 -sink -S < %s \| FileCheck %s -check-prefixes=CHECK,UNI
	; Verify that IR sinking does not move convergent operations to			; RUN: opt -enable-new-pm=0 -use-gpu-divergence-analysis -sink -S < %s \| FileCheck %s -check-prefixes=CHECK,UNI
	; blocks that are not control equivalent.
				; RUN: opt -enable-new-pm=1 -sink -assume-always-divergent -S < %s \| FileCheck %s -check-prefixes=CHECK,DIV
	; CHECK: define i32 @foo			; RUN: opt -enable-new-pm=0 -sink -assume-always-divergent -S < %s \| FileCheck %s -check-prefixes=CHECK,DIV
				sameerdsAuthorUnsubmitted Done Reply Inline Actions The earlier attempt had copied the test into two versions, one with the default target and one with AMDGPU. But this is not scalable with more changes in the pipeline that have lots of other tests. Instead, the new option forces the new DA to report "divergent" for everything, which allows us to test convergent operations even on the default target. The tests can then run in every build, ensuring good coverage. sameerds: The earlier attempt had copied the test into two versions, one with the default target and one…
	; CHECK: entry			; RUN: opt -enable-new-pm=0 -sink -use-gpu-divergence-analysis -assume-always-divergent -S < %s \| FileCheck %s -check-prefixes=CHECK,DIV
	; CHECK-NEXT: call i32 @bar
	; CHECK-NEXT: br i1 %arg			; Verify that sinking does not move convergent operations if the
				; control flow is divergent.

				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; DIV-NEXT: [[C:%.*]] = call i32 @bar()
				; CHECK-NEXT: br i1 [[ARG:%.]], label [[THEN:%.]], label [[END:%.*]]
				; CHECK-EMPTY:
				; CHECK-NEXT: then:
				; UNI-NEXT: [[C:%.*]] = call i32 @bar()
				; CHECK-NEXT: ret i32 [[C]]
				; CHECK-EMPTY:
				; CHECK: end:
				; CHECK-NEXT: ret i32 0

	define i32 @foo(i1 %arg) {			define i32 @foo(i1 %arg) {
	entry:			entry:
	%c = call i32 @bar() nounwind readonly convergent			%c = call i32 @bar() nounwind readonly convergent
	br i1 %arg, label %then, label %end			br i1 %arg, label %then, label %end

	then:			then:
	ret i32 %c			ret i32 %c

	end:			end:
	ret i32 0			ret i32 0
	}			}

	declare i32 @bar() nounwind readonly convergent			declare i32 @bar() nounwind readonly convergent

This is an archive of the discontinued LLVM Phabricator instance.

[Sink] allow sinking convergent operations across uniform branchesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 364354

llvm/include/llvm/Analysis/DivergenceAnalysis.h

llvm/include/llvm/Analysis/LegacyDivergenceAnalysis.h

llvm/lib/Analysis/DivergenceAnalysis.cpp

llvm/lib/Analysis/LegacyDivergenceAnalysis.cpp

llvm/lib/Transforms/Scalar/LoopUnswitch.cpp

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

llvm/lib/Transforms/Scalar/Sink.cpp

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/irreducible.ll

llvm/test/Transforms/LoopUnswitch/AMDGPU/divergent-unswitch.ll

llvm/test/Transforms/LoopUnswitch/divergent.ll

llvm/test/Transforms/Sink/convergent.ll

[Sink] allow sinking convergent operations across uniform branches
AbandonedPublic