This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
DivergenceAnalysis.h
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
AssumptionCache.cpp
1
DivergenceAnalysis.cpp
-
LegacyDivergenceAnalysis.cpp
-
TargetTransformInfo.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUTargetTransformInfo.h
-
AMDGPUTargetTransformInfo.cpp
-
test/Analysis/DivergenceAnalysis/AMDGPU/
-
Analysis/
-
DivergenceAnalysis/
-
AMDGPU/
-
assume.ll
-
unittests/Analysis/
-
Analysis/
-
DivergenceAnalysisTest.cpp

Differential D137142

[WIP] DivergenceAnalysis: Infer uniformity from assume calls
Needs ReviewPublic

Authored by arsenm on Oct 31 2022, 6:28 PM.

Download Raw Diff

Details

Reviewers

sameerds
nhaehnle
simoll
jdoerfert

Group Reviewers

Restricted Project

Summary

I believe this patch is OK as is, but is currently useless in practice and
I'm not sure how useful this really will be. Theoretically this should allow
something like:

kernel void foo(global int* global* arg_ptr) {

global int* ptr = arg_ptr[get_global_id(0)];
__builtin_assume(sub_group_all(ptr != NULL));
if (ptr != NULL) {
    *ptr += 1;
}

}

to use a scalar branch around the pointer dereference. There are a few obstacles
to this working today. First, using sub_group_all generates this warning
for some reason:

warning: the argument to '__builtin_assume' has side effects that will be discarded

Second, the device libraries are still using the legacy llvm.amdgcn.icmp intrinsics
instead of ballot.

Third, the device libraries are still using an inline assembly hack in lieu of
convergence tokens.

Fourth, even if those issues are avoided, the branch is still treated
as divergent when ultimately selected.

Diff Detail

Event Timeline

arsenm created this revision.Oct 31 2022, 6:28 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 6:28 PM

Herald added subscribers: kosarev, foad, kerbowa and 2 others. · View Herald Transcript

arsenm requested review of this revision.Oct 31 2022, 6:28 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 6:28 PM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B195403: Diff 472197.Oct 31 2022, 6:28 PM

I generally think this is worthwhile. As you noted, there are still problems but we could move this part ahead I think.

warning: the argument to '__builtin_assume' has side effects that will be discarded

__builtin_assume is dropped if we cannot show that the expression is side-effect free as part of the lowering. The expression you used has arbitrary side-effects, I think (there is no godbold for HIP):
https://github.com/llvm/llvm-project/blob/6c8995649afac04a9eb0e71affd997e493c9b93a/clang/lib/Sema/OpenCLBuiltins.td#L1718

That said, multiple ways around this, easiest for now:

int assumption = sub_group_all(P != NULL);
__builtin_assume(assumption);

llvm/lib/Analysis/DivergenceAnalysis.cpp
154	I'm not super sure this must hold. I'd just check it.

I don't quite see the point of this change. For test cases like @assume_ballot_eq_0, what we really should be doing here is optimize the branch away entirely because the llvm.assume implies that %cmp == 0.

It sounds like what we really want here is a sort of llvm.assume.uniform intrinsic. Or maybe an llvm.amdgcn.is.uniform intrinsic and then do llvm.assume(llvm.amdgcn.is.uniform)

In D137142#3902426, @nhaehnle wrote:

I don't quite see the point of this change. For test cases like @assume_ballot_eq_0, what we really should be doing here is optimize the branch away entirely because the llvm.assume implies that %cmp == 0.

It sounds like what we really want here is a sort of llvm.assume.uniform intrinsic. Or maybe an llvm.amdgcn.is.uniform intrinsic and then do llvm.assume(llvm.amdgcn.is.uniform)

Not a lot of time right now to follow up on this.. and as much as i dislike passerby comments:
For the DA in isolation, ideally, we'd have something like:

%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.

The DA would automatically pick up on the uniformity without any changes.
The assume intrinsic is non-speculatable to keep the control dependences around.
Obfuscating the %X -> foo def-use chain may inflict some damage to other analyses, though.

For the DA in isolation, ideally, we'd have something like:
%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.

This is roughly what the target-specific @llvm.amdgcn.readfirstlane does today, and some frontends do use it to assert and/or enforce uniformity of particular values. There is some sublety about exactly what it means (or exactly what @llvm.assume.uniform should mean): Read the first active lane? Read an arbitrary active lane? Undefined/poison if active lanes do not all have the same value?

In D137142#4046742, @foad wrote:
For the DA in isolation, ideally, we'd have something like:
%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.
This is roughly what the target-specific @llvm.amdgcn.readfirstlane does today, and some frontends do use it to assert and/or enforce uniformity of particular values. There is some sublety about exactly what it means (or exactly what @llvm.assume.uniform should mean): Read the first active lane? Read an arbitrary active lane? Undefined/poison if active lanes do not all have the same value?

Read all active lanes. The intrinsic only tells us that we can assume uniformity among the active lanes in each instance, it could not be used to enforce it. Not so sure about the values on inactive lanes, I'd say it simply passes through the incoming values.. you may just want poison here though..

In D137142#4051045, @simoll wrote:
In D137142#4046742, @foad wrote:
For the DA in isolation, ideally, we'd have something like:
%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.
This is roughly what the target-specific @llvm.amdgcn.readfirstlane does today, and some frontends do use it to assert and/or enforce uniformity of particular values. There is some sublety about exactly what it means (or exactly what @llvm.assume.uniform should mean): Read the first active lane? Read an arbitrary active lane? Undefined/poison if active lanes do not all have the same value?
Read all active lanes. The intrinsic only tells us that we can assume uniformity among the active lanes in each instance, it could not be used to enforce it. Not so sure about the values on inactive lanes, I'd say it simply passes through the incoming values.. you may just want poison here though..

We do need to say what happens if the assumptions is wrong. I believe at a minimum we need to say that the result is poison, because of what happens when the result feeds into a conditional branch: divergence analysis uses the assumption, which can affect codegen. So UB on that branch if the assumption is wrong seems like the minimum we need.

Though immediate UB is a legitimate alternative, since it would allow us to replace other uses of %X by %Y.

In D137142#4072601, @nhaehnle wrote:
In D137142#4051045, @simoll wrote:
In D137142#4046742, @foad wrote:
For the DA in isolation, ideally, we'd have something like:
%Y = llvm.assume.uniform(%X)
foo(%Y) ; <- Rewritten to use %Y instead of %X.
This is roughly what the target-specific @llvm.amdgcn.readfirstlane does today, and some frontends do use it to assert and/or enforce uniformity of particular values. There is some sublety about exactly what it means (or exactly what @llvm.assume.uniform should mean): Read the first active lane? Read an arbitrary active lane? Undefined/poison if active lanes do not all have the same value?
Read all active lanes. The intrinsic only tells us that we can assume uniformity among the active lanes in each instance, it could not be used to enforce it. Not so sure about the values on inactive lanes, I'd say it simply passes through the incoming values.. you may just want poison here though..
We do need to say what happens if the assumptions is wrong. I believe at a minimum we need to say that the result is poison, because of what happens when the result feeds into a conditional branch: divergence analysis uses the assumption, which can affect codegen. So UB on that branch if the assumption is wrong seems like the minimum we need.

Though immediate UB is a legitimate alternative, since it would allow us to replace other uses of %X by %Y.

.. and by extension you can take the control conditions of the call as pre-conditions whereas if it's 'just' poison you can only do that for the instructions that actually trigger ub upon poison. You may want to turn the pre-condition into an explicit parameter, as in:

%Y = llvm.assume.uniform(%X, %mask) <-- triggers immediate UB where %X is not uniform among the threads that actively execute this in lock-step and where %mask is true.

You are then free to rewrite uses of %X into uses of %Y where the use is dominated by the intrinsic to improve DA precision.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

DivergenceAnalysis.h

10 lines

TargetTransformInfo.h

20 lines

TargetTransformInfoImpl.h

5 lines

CodeGen/

BasicTTIImpl.h

4 lines

lib/

Analysis/

AssumptionCache.cpp

5 lines

DivergenceAnalysis.cpp

47 lines

LegacyDivergenceAnalysis.cpp

6 lines

TargetTransformInfo.cpp

5 lines

Target/

AMDGPU/

AMDGPUTargetTransformInfo.h

2 lines

AMDGPUTargetTransformInfo.cpp

34 lines

test/

Analysis/

DivergenceAnalysis/

AMDGPU/

assume.ll

289 lines

unittests/

Analysis/

DivergenceAnalysisTest.cpp

9 lines

Diff 472197

llvm/include/llvm/Analysis/DivergenceAnalysis.h

Show All 15 Lines
#define LLVM_ANALYSIS_DIVERGENCEANALYSIS_H		#define LLVM_ANALYSIS_DIVERGENCEANALYSIS_H

#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/Analysis/SyncDependenceAnalysis.h"		#include "llvm/Analysis/SyncDependenceAnalysis.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
		class AssumptionCache;
class Function;		class Function;
class Instruction;		class Instruction;
class Loop;		class Loop;
class raw_ostream;		class raw_ostream;
class TargetTransformInfo;		class TargetTransformInfo;
class Value;		class Value;

/// \brief Generic divergence analysis for reducible CFGs.		/// \brief Generic divergence analysis for reducible CFGs.
///		///
/// This analysis propagates divergence in a data-parallel context from sources		/// This analysis propagates divergence in a data-parallel context from sources
/// of divergence to all users. It requires reducible CFGs. All assignments		/// of divergence to all users. It requires reducible CFGs. All assignments
/// should be in SSA form.		/// should be in SSA form.
class DivergenceAnalysisImpl {		class DivergenceAnalysisImpl {
public:		public:
/// \brief This instance will analyze the whole function \p F or the loop \p		/// \brief This instance will analyze the whole function \p F or the loop \p
/// RegionLoop.		/// RegionLoop.
///		///
/// \param RegionLoop if non-null the analysis is restricted to \p RegionLoop.		/// \param RegionLoop if non-null the analysis is restricted to \p RegionLoop.
/// Otherwise the whole function is analyzed.		/// Otherwise the whole function is analyzed.
/// \param IsLCSSAForm whether the analysis may assume that the IR in the		/// \param IsLCSSAForm whether the analysis may assume that the IR in the
/// region in LCSSA form.		/// region in LCSSA form.
DivergenceAnalysisImpl(const Function &F, const Loop *RegionLoop,		DivergenceAnalysisImpl(const Function &F, const Loop *RegionLoop,
const DominatorTree &DT, const LoopInfo &LI,		const DominatorTree &DT, const LoopInfo &LI,
		const TargetTransformInfo &TTI, AssumptionCache &AC,
SyncDependenceAnalysis &SDA, bool IsLCSSAForm);		SyncDependenceAnalysis &SDA, bool IsLCSSAForm);

/// \brief The loop that defines the analyzed region (if any).		/// \brief The loop that defines the analyzed region (if any).
const Loop *getRegionLoop() const { return RegionLoop; }		const Loop *getRegionLoop() const { return RegionLoop; }
const Function &getFunction() const { return F; }		const Function &getFunction() const { return F; }

/// \brief Whether \p BB is part of the region.		/// \brief Whether \p BB is part of the region.
bool inRegion(const BasicBlock &BB) const;		bool inRegion(const BasicBlock &BB) const;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	private:
void analyzeLoopExitDivergence(const BasicBlock &DivExit,		void analyzeLoopExitDivergence(const BasicBlock &DivExit,
const Loop &OuterDivLoop);		const Loop &OuterDivLoop);

/// \brief Mark all instruction as divergent that use a value defined in \p		/// \brief Mark all instruction as divergent that use a value defined in \p
/// OuterDivLoop. Push their users on the worklist.		/// OuterDivLoop. Push their users on the worklist.
void analyzeTemporalDivergence(const Instruction &I,		void analyzeTemporalDivergence(const Instruction &I,
const Loop &OuterDivLoop);		const Loop &OuterDivLoop);

		/// Check if \p V can be assumed uniform at \p User.
		bool isUseAssumedAllUniform(const Instruction &User, const Value &V) const;

/// \brief Push all users of \p Val (in the region) to the worklist.		/// \brief Push all users of \p Val (in the region) to the worklist.
void pushUsers(const Value &I);		void pushUsers(const Value &I);

/// \brief Whether \p Val is divergent when read in \p ObservingBlock.		/// \brief Whether \p Val is divergent when read in \p ObservingBlock.
bool isTemporalDivergent(const BasicBlock &ObservingBlock,		bool isTemporalDivergent(const BasicBlock &ObservingBlock,
const Value &Val) const;		const Value &Val) const;

private:		private:
const Function &F;		const Function &F;
// If regionLoop != nullptr, analysis is only performed within \p RegionLoop.		// If regionLoop != nullptr, analysis is only performed within \p RegionLoop.
// Otherwise, analyze the whole function		// Otherwise, analyze the whole function
const Loop *RegionLoop;		const Loop *RegionLoop;

const DominatorTree &DT;		const DominatorTree &DT;
const LoopInfo &LI;		const LoopInfo &LI;
		const TargetTransformInfo &TTI;
		AssumptionCache &AC;

// Recognized divergent loops		// Recognized divergent loops
DenseSet<const Loop *> DivergentLoops;		DenseSet<const Loop *> DivergentLoops;

// The SDA links divergent branches to divergent control-flow joins.		// The SDA links divergent branches to divergent control-flow joins.
SyncDependenceAnalysis &SDA;		SyncDependenceAnalysis &SDA;

// Use simplified code path for LCSSA form.		// Use simplified code path for LCSSA form.
Show All 18 Lines	class DivergenceInfo {
// this function are conservatively reported as divergent instead.		// this function are conservatively reported as divergent instead.
bool ContainsIrreducible = false;		bool ContainsIrreducible = false;
std::unique_ptr<SyncDependenceAnalysis> SDA;		std::unique_ptr<SyncDependenceAnalysis> SDA;
std::unique_ptr<DivergenceAnalysisImpl> DA;		std::unique_ptr<DivergenceAnalysisImpl> DA;

public:		public:
DivergenceInfo(Function &F, const DominatorTree &DT,		DivergenceInfo(Function &F, const DominatorTree &DT,
const PostDominatorTree &PDT, const LoopInfo &LI,		const PostDominatorTree &PDT, const LoopInfo &LI,
const TargetTransformInfo &TTI, bool KnownReducible);		const TargetTransformInfo &TTI, AssumptionCache &AC,
		bool KnownReducible);

/// Whether any divergence was detected.		/// Whether any divergence was detected.
bool hasDivergence() const {		bool hasDivergence() const {
return ContainsIrreducible \|\| DA->hasDetectedDivergence();		return ContainsIrreducible \|\| DA->hasDetectedDivergence();
}		}

/// The GPU kernel this analysis result is for		/// The GPU kernel this analysis result is for
const Function &getFunction() const { return F; }		const Function &getFunction() const { return F; }
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	public:
/// algorithm starting with the sources of divergence.		/// algorithm starting with the sources of divergence.
bool isSourceOfDivergence(const Value *V) const;		bool isSourceOfDivergence(const Value *V) const;

// Returns true for the target specific		// Returns true for the target specific
// set of operations which produce uniform result		// set of operations which produce uniform result
// even taking non-uniform arguments		// even taking non-uniform arguments
bool isAlwaysUniform(const Value *V) const;		bool isAlwaysUniform(const Value *V) const;

		enum class BallotKind {
		NotBallot,
		All,
		Any
		};

		/// For targets with execution units that progress in lock step.
		///
		/// Check if \p I is a call that performs a ballot or vote / operation
		/// (e.g. OpenCL's sub_group_all or sub_group_any). Returns the asserted
		/// value, and the ballot type.
		std::pair<const Value , BallotKind> isBallot(const Instruction I) const;

/// Returns the address space ID for a target's 'flat' address space. Note		/// Returns the address space ID for a target's 'flat' address space. Note
/// this is not necessarily the same as addrspace(0), which LLVM sometimes		/// this is not necessarily the same as addrspace(0), which LLVM sometimes
/// refers to as the generic address space. The flat address space is a		/// refers to as the generic address space. The flat address space is a
/// generic address space that can be used access multiple segments of memory		/// generic address space that can be used access multiple segments of memory
/// with different address spaces. Access of a memory location through a		/// with different address spaces. Access of a memory location through a
/// pointer with this address space is expected to be legal but slower		/// pointer with this address space is expected to be legal but slower
/// compared to the same memory location accessed through a pointer with a		/// compared to the same memory location accessed through a pointer with a
/// different address space.		/// different address space.
▲ Show 20 Lines • Show All 1,225 Lines • ▼ Show 20 Lines	public:
virtual InstructionCost getInstructionCost(const User *U,		virtual InstructionCost getInstructionCost(const User *U,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TargetCostKind CostKind) = 0;		TargetCostKind CostKind) = 0;
virtual BranchProbability getPredictableBranchThreshold() = 0;		virtual BranchProbability getPredictableBranchThreshold() = 0;
virtual bool hasBranchDivergence() = 0;		virtual bool hasBranchDivergence() = 0;
virtual bool useGPUDivergenceAnalysis() = 0;		virtual bool useGPUDivergenceAnalysis() = 0;
virtual bool isSourceOfDivergence(const Value *V) = 0;		virtual bool isSourceOfDivergence(const Value *V) = 0;
virtual bool isAlwaysUniform(const Value *V) = 0;		virtual bool isAlwaysUniform(const Value *V) = 0;
		virtual std::pair<const Value *, BallotKind>
		isBallot(const Instruction *I) = 0;
virtual unsigned getFlatAddressSpace() = 0;		virtual unsigned getFlatAddressSpace() = 0;
virtual bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		virtual bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const = 0;		Intrinsic::ID IID) const = 0;
virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;		virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;
virtual bool		virtual bool
canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const = 0;		canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const = 0;
virtual unsigned getAssumedAddrSpace(const Value *V) const = 0;		virtual unsigned getAssumedAddrSpace(const Value *V) const = 0;
virtual bool isSingleThreaded() const = 0;		virtual bool isSingleThreaded() const = 0;
▲ Show 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	public:
bool isSourceOfDivergence(const Value *V) override {		bool isSourceOfDivergence(const Value *V) override {
return Impl.isSourceOfDivergence(V);		return Impl.isSourceOfDivergence(V);
}		}

bool isAlwaysUniform(const Value *V) override {		bool isAlwaysUniform(const Value *V) override {
return Impl.isAlwaysUniform(V);		return Impl.isAlwaysUniform(V);
}		}

		std::pair<const Value *, TargetTransformInfo::BallotKind>
		isBallot(const Instruction *I) override {
		return Impl.isBallot(I);
		}

unsigned getFlatAddressSpace() override { return Impl.getFlatAddressSpace(); }		unsigned getFlatAddressSpace() override { return Impl.getFlatAddressSpace(); }

bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const override {		Intrinsic::ID IID) const override {
return Impl.collectFlatAddressOperands(OpIndexes, IID);		return Impl.collectFlatAddressOperands(OpIndexes, IID);
}		}

bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const override {		bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const override {
▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	public:
bool hasBranchDivergence() const { return false; }		bool hasBranchDivergence() const { return false; }

bool useGPUDivergenceAnalysis() const { return false; }		bool useGPUDivergenceAnalysis() const { return false; }

bool isSourceOfDivergence(const Value *V) const { return false; }		bool isSourceOfDivergence(const Value *V) const { return false; }

bool isAlwaysUniform(const Value *V) const { return false; }		bool isAlwaysUniform(const Value *V) const { return false; }

		std::pair<const Value *, TTI::BallotKind>
		isBallot(const Instruction *) const {
		return {nullptr, TTI::BallotKind::NotBallot};
		}

unsigned getFlatAddressSpace() const { return -1; }		unsigned getFlatAddressSpace() const { return -1; }

bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const {		Intrinsic::ID IID) const {
return false;		return false;
}		}

bool isNoopAddrSpaceCast(unsigned, unsigned) const { return false; }		bool isNoopAddrSpaceCast(unsigned, unsigned) const { return false; }
▲ Show 20 Lines • Show All 1,188 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	public:
bool hasBranchDivergence() { return false; }		bool hasBranchDivergence() { return false; }

bool useGPUDivergenceAnalysis() { return false; }		bool useGPUDivergenceAnalysis() { return false; }

bool isSourceOfDivergence(const Value *V) { return false; }		bool isSourceOfDivergence(const Value *V) { return false; }

bool isAlwaysUniform(const Value *V) { return false; }		bool isAlwaysUniform(const Value *V) { return false; }

		std::pair<const Value , TTI::BallotKind> isBallot(const Instruction I) {
		return {nullptr, TTI::BallotKind::NotBallot};
		}

unsigned getFlatAddressSpace() {		unsigned getFlatAddressSpace() {
// Return an invalid address space.		// Return an invalid address space.
return -1;		return -1;
}		}

bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const {		Intrinsic::ID IID) const {
return false;		return false;
▲ Show 20 Lines • Show All 2,136 Lines • Show Last 20 Lines

llvm/lib/Analysis/AssumptionCache.cpp

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	findAffectedValues(CallBase CI, TargetTransformInfo TTI,
}		}

if (TTI) {		if (TTI) {
const Value *Ptr;		const Value *Ptr;
unsigned AS;		unsigned AS;
std::tie(Ptr, AS) = TTI->getPredicatedAddrSpace(Cond);		std::tie(Ptr, AS) = TTI->getPredicatedAddrSpace(Cond);
if (Ptr)		if (Ptr)
AddAffected(const_cast<Value *>(Ptr->stripInBoundsOffsets()));		AddAffected(const_cast<Value *>(Ptr->stripInBoundsOffsets()));

		if (const Instruction *CondInst = dyn_cast<Instruction>(Cond)) {
		if (const Value *BallotVal = TTI->isBallot(CondInst).first)
		AddAffected(const_cast<Value *>(BallotVal));
		}
}		}
}		}

void AssumptionCache::updateAffectedValues(AssumeInst *CI) {		void AssumptionCache::updateAffectedValues(AssumeInst *CI) {
SmallVector<AssumptionCache::ResultElem, 16> Affected;		SmallVector<AssumptionCache::ResultElem, 16> Affected;
findAffectedValues(CI, TTI, Affected);		findAffectedValues(CI, TTI, Affected);

for (auto &AV : Affected) {		for (auto &AV : Affected) {
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/lib/Analysis/DivergenceAnalysis.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
// generic or local address as divergent. This can be improved by leveraging		// generic or local address as divergent. This can be improved by leveraging
// pointer analysis and/or by modelling non-escaping memory objects in SSA		// pointer analysis and/or by modelling non-escaping memory objects in SSA
// as done in RV.		// as done in RV.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/DivergenceAnalysis.h"		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "divergence"		#define DEBUG_TYPE "divergence"

DivergenceAnalysisImpl::DivergenceAnalysisImpl(		DivergenceAnalysisImpl::DivergenceAnalysisImpl(
const Function &F, const Loop *RegionLoop, const DominatorTree &DT,		const Function &F, const Loop *RegionLoop, const DominatorTree &DT,
const LoopInfo &LI, SyncDependenceAnalysis &SDA, bool IsLCSSAForm)		const LoopInfo &LI, const TargetTransformInfo &TTI, AssumptionCache &AC,
: F(F), RegionLoop(RegionLoop), DT(DT), LI(LI), SDA(SDA),		SyncDependenceAnalysis &SDA, bool IsLCSSAForm)
		: F(F), RegionLoop(RegionLoop), DT(DT), LI(LI), TTI(TTI), AC(AC), SDA(SDA),
IsLCSSAForm(IsLCSSAForm) {}		IsLCSSAForm(IsLCSSAForm) {}

bool DivergenceAnalysisImpl::markDivergent(const Value &DivVal) {		bool DivergenceAnalysisImpl::markDivergent(const Value &DivVal) {
if (isAlwaysUniform(DivVal))		if (isAlwaysUniform(DivVal))
return false;		return false;
assert(isa<Instruction>(DivVal) \|\| isa<Argument>(DivVal));		assert(isa<Instruction>(DivVal) \|\| isa<Argument>(DivVal));
assert(!isAlwaysUniform(DivVal) && "cannot be a divergent");		assert(!isAlwaysUniform(DivVal) && "cannot be a divergent");
return DivergentValues.insert(&DivVal).second;		return DivergentValues.insert(&DivVal).second;
Show All 23 Lines
bool DivergenceAnalysisImpl::inRegion(const Instruction &I) const {		bool DivergenceAnalysisImpl::inRegion(const Instruction &I) const {
return I.getParent() && inRegion(*I.getParent());		return I.getParent() && inRegion(*I.getParent());
}		}

bool DivergenceAnalysisImpl::inRegion(const BasicBlock &BB) const {		bool DivergenceAnalysisImpl::inRegion(const BasicBlock &BB) const {
return RegionLoop ? RegionLoop->contains(&BB) : (BB.getParent() == &F);		return RegionLoop ? RegionLoop->contains(&BB) : (BB.getParent() == &F);
}		}

		bool DivergenceAnalysisImpl::isUseAssumedAllUniform(const Instruction &UserInst,
		const Value &V) const {
		for (auto &AssumeVH : AC.assumptionsFor(&V)) {
		assert(AssumeVH && "IR changed during analysis?");

		CallInst *CI = cast<CallInst>(AssumeVH);
		if (!isValidAssumeForContext(CI, &UserInst, &DT))
		continue;

		const Value *BallotVal;
		TargetTransformInfo::BallotKind BK;
		std::tie(BallotVal, BK) =
		TTI.isBallot(cast<Instruction>(CI->getArgOperand(0)));
		if (BK == TargetTransformInfo::BallotKind::All) {
		assert(BallotVal == &V);
		jdoerfertUnsubmitted Not Done Reply Inline Actions I'm not super sure this must hold. I'd just check it. jdoerfert: I'm not super sure this must hold. I'd just check it.
		return true;
		}
		}

		return false;
		}

void DivergenceAnalysisImpl::pushUsers(const Value &V) {		void DivergenceAnalysisImpl::pushUsers(const Value &V) {
const auto *I = dyn_cast<const Instruction>(&V);		const auto *I = dyn_cast<const Instruction>(&V);

if (I && I->isTerminator()) {		if (I && I->isTerminator()) {
analyzeControlDivergence(*I);		analyzeControlDivergence(*I);
return;		return;
}		}

for (const auto *User : V.users()) {		for (const auto *User : V.users()) {
const auto *UserInst = dyn_cast<const Instruction>(User);		const auto *UserInst = dyn_cast<const Instruction>(User);
if (!UserInst)		if (!UserInst)
continue;		continue;

// only compute divergent inside loop		// only compute divergent inside loop
if (!inRegion(*UserInst))		if (!inRegion(*UserInst))
continue;		continue;

		// Ignore any uses that are assumed uniform at the use point. Currently this
		// is assumed to only apply for boolean ballots. If there are any other
		// divergent operands, the instruction should be found through its other
		// divergent operands.
		if (V.getType()->isIntegerTy(1) && isUseAssumedAllUniform(*UserInst, V))
		continue;

// All users of divergent values are immediate divergent		// All users of divergent values are immediate divergent
if (markDivergent(*UserInst))		if (markDivergent(*UserInst))
Worklist.push_back(UserInst);		Worklist.push_back(UserInst);
}		}
}		}

static const Instruction *getIfCarriedInstruction(const Use &U,		static const Instruction *getIfCarriedInstruction(const Use &U,
const Loop &DivLoop) {		const Loop &DivLoop) {
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	bool DivergenceAnalysisImpl::isDivergentUse(const Use &U) const {
Value &V = *U.get();		Value &V = *U.get();
Instruction &I = *cast<Instruction>(U.getUser());		Instruction &I = *cast<Instruction>(U.getUser());
return isDivergent(V) \|\| isTemporalDivergent(*I.getParent(), V);		return isDivergent(V) \|\| isTemporalDivergent(*I.getParent(), V);
}		}

DivergenceInfo::DivergenceInfo(Function &F, const DominatorTree &DT,		DivergenceInfo::DivergenceInfo(Function &F, const DominatorTree &DT,
const PostDominatorTree &PDT, const LoopInfo &LI,		const PostDominatorTree &PDT, const LoopInfo &LI,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
bool KnownReducible)		AssumptionCache &AC, bool KnownReducible)
: F(F) {		: F(F) {
if (!KnownReducible) {		if (!KnownReducible) {
using RPOTraversal = ReversePostOrderTraversal<const Function *>;		using RPOTraversal = ReversePostOrderTraversal<const Function *>;
RPOTraversal FuncRPOT(&F);		RPOTraversal FuncRPOT(&F);
if (containsIrreducibleCFG<const BasicBlock *, const RPOTraversal,		if (containsIrreducibleCFG<const BasicBlock *, const RPOTraversal,
const LoopInfo>(FuncRPOT, LI)) {		const LoopInfo>(FuncRPOT, LI)) {
ContainsIrreducible = true;		ContainsIrreducible = true;
return;		return;
}		}
}		}
SDA = std::make_unique<SyncDependenceAnalysis>(DT, PDT, LI);		SDA = std::make_unique<SyncDependenceAnalysis>(DT, PDT, LI);
DA = std::make_unique<DivergenceAnalysisImpl>(F, nullptr, DT, LI, *SDA,		DA = std::make_unique<DivergenceAnalysisImpl>(F, nullptr, DT, LI, TTI, AC,
		*SDA,
/* LCSSA */ false);		/* LCSSA */ false);
for (auto &I : instructions(F)) {		for (auto &I : instructions(F)) {
if (TTI.isSourceOfDivergence(&I)) {		if (TTI.isSourceOfDivergence(&I)) {
DA->markDivergent(I);		DA->markDivergent(I);
} else if (TTI.isAlwaysUniform(&I)) {		} else if (TTI.isAlwaysUniform(&I)) {
DA->addUniformOverride(I);		DA->addUniformOverride(I);
}		}
}		}
Show All 9 Lines
AnalysisKey DivergenceAnalysis::Key;		AnalysisKey DivergenceAnalysis::Key;

DivergenceAnalysis::Result		DivergenceAnalysis::Result
DivergenceAnalysis::run(Function &F, FunctionAnalysisManager &AM) {		DivergenceAnalysis::run(Function &F, FunctionAnalysisManager &AM) {
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);		auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
auto &LI = AM.getResult<LoopAnalysis>(F);		auto &LI = AM.getResult<LoopAnalysis>(F);
auto &TTI = AM.getResult<TargetIRAnalysis>(F);		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
		auto &AC = AM.getResult<AssumptionAnalysis>(F);

return DivergenceInfo(F, DT, PDT, LI, TTI, /* KnownReducible = */ false);		return DivergenceInfo(F, DT, PDT, LI, TTI, AC,
		/* KnownReducible = */ false);
}		}

PreservedAnalyses		PreservedAnalyses
DivergenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {		DivergenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {
auto &DI = FAM.getResult<DivergenceAnalysis>(F);		auto &DI = FAM.getResult<DivergenceAnalysis>(F);
OS << "'Divergence Analysis' for function '" << F.getName() << "':\n";		OS << "'Divergence Analysis' for function '" << F.getName() << "':\n";
if (DI.hasDivergence()) {		if (DI.hasDivergence()) {
for (auto &Arg : F.args()) {		for (auto &Arg : F.args()) {
Show All 13 Lines

llvm/lib/Analysis/LegacyDivergenceAnalysis.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
// 2. memory as black box. It conservatively considers values loaded from		// 2. memory as black box. It conservatively considers values loaded from
// generic or local address as divergent. This can be improved by leveraging		// generic or local address as divergent. This can be improved by leveraging
// pointer analysis.		// pointer analysis.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/DivergenceAnalysis.h"		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/Passes.h"		#include "llvm/Analysis/Passes.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
LegacyDivergenceAnalysis::LegacyDivergenceAnalysis() : FunctionPass(ID) {		LegacyDivergenceAnalysis::LegacyDivergenceAnalysis() : FunctionPass(ID) {
initializeLegacyDivergenceAnalysisPass(*PassRegistry::getPassRegistry());		initializeLegacyDivergenceAnalysisPass(*PassRegistry::getPassRegistry());
}		}
INITIALIZE_PASS_BEGIN(LegacyDivergenceAnalysis, "divergence",		INITIALIZE_PASS_BEGIN(LegacyDivergenceAnalysis, "divergence",
"Legacy Divergence Analysis", false, true)		"Legacy Divergence Analysis", false, true)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_END(LegacyDivergenceAnalysis, "divergence",		INITIALIZE_PASS_END(LegacyDivergenceAnalysis, "divergence",
"Legacy Divergence Analysis", false, true)		"Legacy Divergence Analysis", false, true)

FunctionPass *llvm::createLegacyDivergenceAnalysisPass() {		FunctionPass *llvm::createLegacyDivergenceAnalysisPass() {
return new LegacyDivergenceAnalysis();		return new LegacyDivergenceAnalysis();
}		}

void LegacyDivergenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {		void LegacyDivergenceAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequiredTransitive<DominatorTreeWrapperPass>();		AU.addRequiredTransitive<DominatorTreeWrapperPass>();
AU.addRequiredTransitive<PostDominatorTreeWrapperPass>();		AU.addRequiredTransitive<PostDominatorTreeWrapperPass>();
AU.addRequiredTransitive<LoopInfoWrapperPass>();		AU.addRequiredTransitive<LoopInfoWrapperPass>();
		AU.addRequiredTransitive<AssumptionCacheTracker>();
AU.setPreservesAll();		AU.setPreservesAll();
}		}

bool LegacyDivergenceAnalysis::shouldUseGPUDivergenceAnalysis(		bool LegacyDivergenceAnalysis::shouldUseGPUDivergenceAnalysis(
const Function &F, const TargetTransformInfo &TTI) const {		const Function &F, const TargetTransformInfo &TTI) const {
if (!(UseGPUDA \|\| TTI.useGPUDivergenceAnalysis()))		if (!(UseGPUDA \|\| TTI.useGPUDivergenceAnalysis()))
return false;		return false;

Show All 17 Lines	if (!TTI.hasBranchDivergence())
return false;		return false;

DivergentValues.clear();		DivergentValues.clear();
DivergentUses.clear();		DivergentUses.clear();
gpuDA = nullptr;		gpuDA = nullptr;

auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();		auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);

if (shouldUseGPUDivergenceAnalysis(F, TTI)) {		if (shouldUseGPUDivergenceAnalysis(F, TTI)) {
// run the new GPU divergence analysis		// run the new GPU divergence analysis
auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
gpuDA = std::make_unique<DivergenceInfo>(F, DT, PDT, LI, TTI,		gpuDA = std::make_unique<DivergenceInfo>(F, DT, PDT, LI, TTI, AC,
/* KnownReducible = */ true);		/* KnownReducible = */ true);

} else {		} else {
// run LLVM's existing DivergenceAnalysis		// run LLVM's existing DivergenceAnalysis
DivergencePropagator DP(F, TTI, DT, PDT, DivergentValues, DivergentUses);		DivergencePropagator DP(F, TTI, DT, PDT, DivergentValues, DivergentUses);
DP.populateWithSourcesOfDivergence();		DP.populateWithSourcesOfDivergence();
DP.propagate();		DP.propagate();
}		}
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::isSourceOfDivergence(const Value *V) const {			bool TargetTransformInfo::isSourceOfDivergence(const Value *V) const {
	return TTIImpl->isSourceOfDivergence(V);			return TTIImpl->isSourceOfDivergence(V);
	}			}

	bool llvm::TargetTransformInfo::isAlwaysUniform(const Value *V) const {			bool llvm::TargetTransformInfo::isAlwaysUniform(const Value *V) const {
	return TTIImpl->isAlwaysUniform(V);			return TTIImpl->isAlwaysUniform(V);
	}			}

				std::pair<const Value *, TTI::BallotKind>
				llvm::TargetTransformInfo::isBallot(const Instruction *I) const {
				return TTIImpl->isBallot(I);
				}

	unsigned TargetTransformInfo::getFlatAddressSpace() const {			unsigned TargetTransformInfo::getFlatAddressSpace() const {
	return TTIImpl->getFlatAddressSpace();			return TTIImpl->getFlatAddressSpace();
	}			}

	bool TargetTransformInfo::collectFlatAddressOperands(			bool TargetTransformInfo::collectFlatAddressOperands(
	SmallVectorImpl<int> &OpIndexes, Intrinsic::ID IID) const {			SmallVectorImpl<int> &OpIndexes, Intrinsic::ID IID) const {
	return TTIImpl->collectFlatAddressOperands(OpIndexes, IID);			return TTIImpl->collectFlatAddressOperands(OpIndexes, IID);
	}			}
	▲ Show 20 Lines • Show All 970 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:

using BaseT::getVectorInstrCost;		using BaseT::getVectorInstrCost;
InstructionCost getVectorInstrCost(unsigned Opcode, Type *ValTy,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *ValTy,
unsigned Index);		unsigned Index);

bool isReadRegisterSourceOfDivergence(const IntrinsicInst *ReadReg) const;		bool isReadRegisterSourceOfDivergence(const IntrinsicInst *ReadReg) const;
bool isSourceOfDivergence(const Value *V) const;		bool isSourceOfDivergence(const Value *V) const;
bool isAlwaysUniform(const Value *V) const;		bool isAlwaysUniform(const Value *V) const;
		std::pair<const Value *, TTI::BallotKind>
		isBallot(const Instruction *I) const;

unsigned getFlatAddressSpace() const {		unsigned getFlatAddressSpace() const {
// Don't bother running InferAddressSpaces pass on graphics shaders which		// Don't bother running InferAddressSpaces pass on graphics shaders which
// don't use flat addressing.		// don't use flat addressing.
if (IsGraphics)		if (IsGraphics)
return -1;		return -1;
return AMDGPUAS::FLAT_ADDRESS;		return AMDGPUAS::FLAT_ADDRESS;
}		}
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 999 Lines • ▼ Show 20 Lines	bool GCNTTIImpl::isAlwaysUniform(const Value *V) const {
// divergent for the overall struct return. We need to override it in the		// divergent for the overall struct return. We need to override it in the
// case we're extracting an SGPR component here.		// case we're extracting an SGPR component here.
if (CI->isInlineAsm())		if (CI->isInlineAsm())
return !isInlineAsmSourceOfDivergence(CI, ExtValue->getIndices());		return !isInlineAsmSourceOfDivergence(CI, ExtValue->getIndices());

return false;		return false;
}		}

		std::pair<const Value *, TTI::BallotKind>
		GCNTTIImpl::isBallot(const Instruction *I) const {
		using namespace PatternMatch;

		ICmpInst::Predicate Pred;
		Value *BallotVal = nullptr;
		Value *ReadReg = nullptr;
		if (!match(I, m_c_ICmp(
		Pred,
		m_Intrinsic<Intrinsic::amdgcn_ballot>(m_Value(BallotVal)),
		m_Intrinsic<Intrinsic::read_register>(m_Value(ReadReg)))) \|\|
		!ICmpInst::isEquality(Pred))
		return {nullptr, TTI::BallotKind::NotBallot};

		// Make sure the read exec is done in the same block as the ballot
		//
		// FIXME: Should really check they have the same convergence
		// token. Alternatively, could have a dedicated vote all intrinsic.
		if (cast<Instruction>(I->getOperand(0))->getParent() !=
		cast<Instruction>(I->getOperand(1))->getParent())
		return {nullptr, TTI::BallotKind::NotBallot};

		auto *Node = cast<MDNode>(cast<MetadataAsValue>(ReadReg)->getMetadata());
		auto *ReadRegName = cast<MDString>(Node->getOperand(0));
		const StringRef ExecName = ST->isWave64() ? "exec" : "exec_lo";
		if (ReadRegName->getString() != ExecName)
		return {nullptr, TTI::BallotKind::NotBallot};

		if (Pred == ICmpInst::ICMP_EQ)
		return {BallotVal, TTI::BallotKind::All};

		return {nullptr, TTI::BallotKind::NotBallot};
		}

bool GCNTTIImpl::collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		bool GCNTTIImpl::collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const {		Intrinsic::ID IID) const {
switch (IID) {		switch (IID) {
case Intrinsic::amdgcn_atomic_inc:		case Intrinsic::amdgcn_atomic_inc:
case Intrinsic::amdgcn_atomic_dec:		case Intrinsic::amdgcn_atomic_dec:
case Intrinsic::amdgcn_ds_fadd:		case Intrinsic::amdgcn_ds_fadd:
case Intrinsic::amdgcn_ds_fmin:		case Intrinsic::amdgcn_ds_fmin:
case Intrinsic::amdgcn_ds_fmax:		case Intrinsic::amdgcn_ds_fmax:
▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/assume.ll

This file was added.

				; RUN: opt -mtriple amdgcn-unknown-amdhsa -passes='print<divergence>' -disable-output %s 2>&1 \| FileCheck -strict-whitespace %s

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_eq_neg1'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_eq_neg1(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%all = icmp eq i64 %ballot, -1
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_eq_0'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_eq_0(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%all = icmp eq i64 %ballot, 0
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_eq_popcnt64'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_eq_popcnt64(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%ctpop = call i64 @llvm.ctpop.i64(i64 %ballot)
				%all = icmp eq i64 %ctpop, 64
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_ne_popcnt64'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_ne_popcnt64(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%ctpop = call i64 @llvm.ctpop.i64(i64 %ballot)
				%all = icmp ne i64 %ctpop, 64
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_eq_read_exec'
				; CHECK: {{^}}{{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}{{^}} br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_eq_read_exec(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_eq_read_wrong_reg'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_eq_read_wrong_reg(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !1)
				%all = icmp eq i64 %ballot, %exec
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_ne_read_exec'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_ne_read_exec(i32 %x) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp ne i64 %ballot, %exec
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_select_user'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}} %select = select i1 %cmp, i32 123, i32 456
				define void @assume_ballot_select_user(i32 %x, ptr addrspace(1) %ptr) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				%load = load i32, ptr addrspace(1) %ptr
				call void @llvm.assume(i1 %all)
				%select = select i1 %cmp, i32 123, i32 456
				store i32 %select, ptr addrspace(1) %ptr
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_argument_select_user'
				; CHECK: {{^}} %select = select i1 %arg.bool, i32 123, i32 456
				define void @assume_ballot_argument_select_user(i1 %arg.bool, i32 %x, ptr addrspace(1) %ptr) {
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %arg.bool)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				%load = load i32, ptr addrspace(1) %ptr
				call void @llvm.assume(i1 %all)
				%select = select i1 %arg.bool, i32 123, i32 456
				store i32 %select, ptr addrspace(1) %ptr
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_wrong_bool_uniform'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: %select = select i1 %cmp, i32 123, i32 456
				define void @assume_wrong_bool_uniform(i1 %arg.bool, i32 %x, ptr addrspace(1) %ptr) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				call void @llvm.assume(i1 %arg.bool)
				%load = load i32, ptr addrspace(1) %ptr
				%select = select i1 %cmp, i32 123, i32 456
				store i32 %select, ptr addrspace(1) %ptr
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_select_user_other_divergent_input'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: %select = select i1 %cmp, i32 123, i32 %load
				define void @assume_ballot_select_user_other_divergent_input(i32 %x, ptr addrspace(1) %ptr) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				%load = load i32, ptr addrspace(1) %ptr
				call void @llvm.assume(i1 %all)
				%select = select i1 %cmp, i32 123, i32 %load
				store i32 %select, ptr addrspace(1) %ptr
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_eq_read_exec_out_of_block_user'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}} %select = select i1 %cmp, i32 123, i32 456
				define void @assume_ballot_eq_read_exec_out_of_block_user(i32 %x, ptr addrspace(1) %ptr) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				%load = load i32, ptr addrspace(1) %ptr
				call void @llvm.assume(i1 %all)
				br i1 %cmp, label %foo, label %bar

				foo:
				%select = select i1 %cmp, i32 123, i32 456
				%add = add i32 %load, %load
				store i32 %add, ptr addrspace(1) %ptr
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_assume_wrong_place'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %cmp, label %foo, label %bar
				define void @assume_ballot_assume_wrong_place(i32 %x, ptr addrspace(1) %ptr) {
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				%load = load i32, ptr addrspace(1) %ptr
				br i1 %cmp, label %foo, label %bar

				foo:
				call void @llvm.assume(i1 %all)
				%select = select i1 %cmp, i32 123, i32 456
				%add = add i32 %load, %load
				store i32 %add, ptr addrspace(1) %ptr
				ret void

				bar:
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_select_i1_i1_i1'
				; CHECK: {{^}}DIVERGENT: %cmp0 = icmp eq i32 %x, 0
				; CHECK: {{^}} %all0 = icmp eq i64 %ballot0, %exec0
				; CHECK: {{^}}DIVERGENT: %cmp1 = icmp eq i32 %y, 0
				; CHECK: {{^}} %all1 = icmp eq i64 %ballot1, %exec1
				; CHECK: {{^}}DIVERGENT: %cmp2 = icmp eq i32 %z, 0
				; CHECK: {{^}} %all2 = icmp eq i64 %ballot2, %exec2
				; CHECK: {{^}} %select = select i1 %cmp0, i1 %cmp1, i1 %cmp2
				define void @assume_ballot_select_i1_i1_i1(i32 %x, i32 %y, i32 %z, ptr addrspace(1) %ptr) {
				%cmp0 = icmp eq i32 %x, 0
				%ballot0 = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp0)
				%exec0 = call i64 @llvm.read_register(metadata !0)
				%all0 = icmp eq i64 %ballot0, %exec0
				call void @llvm.assume(i1 %all0)

				%cmp1 = icmp eq i32 %y, 0
				%ballot1 = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp1)
				%exec1 = call i64 @llvm.read_register(metadata !0)
				%all1 = icmp eq i64 %ballot1, %exec1
				call void @llvm.assume(i1 %all1)

				%cmp2 = icmp eq i32 %z, 0
				%ballot2 = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp2)
				%exec2 = call i64 @llvm.read_register(metadata !0)
				%all2 = icmp eq i64 %ballot2, %exec2
				call void @llvm.assume(i1 %all2)

				%select = select i1 %cmp0, i1 %cmp1, i1 %cmp2
				store i1 %select, ptr addrspace(1) %ptr
				ret void
				}

				; CHECK-LABEL: Divergence Analysis' for function 'assume_ballot_read_register_wrong_block'
				; CHECK: {{^}}DIVERGENT: %cmp = icmp eq i32 %x, 0
				; CHECK: {{^}}DIVERGENT: br i1 %br.cond, label %bb1, label %bb2
				; CHECK: {{^}}DIVERGENT: %select = select i1 %cmp, i32 123, i32 456
				define void @assume_ballot_read_register_wrong_block(i1 %br.cond, i32 %x, ptr addrspace(1) %ptr) {
				bb0:
				%cmp = icmp eq i32 %x, 0
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				br i1 %br.cond, label %bb1, label %bb2

				bb1:
				%exec = call i64 @llvm.read_register(metadata !0)
				%all = icmp eq i64 %ballot, %exec
				%load = load i32, ptr addrspace(1) %ptr
				call void @llvm.assume(i1 %all)
				%select = select i1 %cmp, i32 123, i32 456
				store i32 %select, ptr addrspace(1) %ptr
				br label %bb2

				bb2:
				ret void
				}

				declare i64 @llvm.amdgcn.ballot.i64(i1)
				declare i64 @llvm.ctpop.i64(i64)
				declare void @llvm.assume(i1)
				declare i64 @llvm.read_register(metadata)
				!0 = !{!"exec"}
				!1 = !{!"s[0:3]"}

llvm/unittests/Analysis/DivergenceAnalysisTest.cpp

//===- DivergenceAnalysisTest.cpp - DivergenceAnalysis unit tests ---------===//		//===- DivergenceAnalysisTest.cpp - DivergenceAnalysis unit tests ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/SyncDependenceAnalysis.h"		#include "llvm/Analysis/SyncDependenceAnalysis.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/AsmParser/Parser.h"		#include "llvm/AsmParser/Parser.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
Show All 22 Lines	protected:
Module M;		Module M;
TargetLibraryInfoImpl TLII;		TargetLibraryInfoImpl TLII;
TargetLibraryInfo TLI;		TargetLibraryInfo TLI;

std::unique_ptr<DominatorTree> DT;		std::unique_ptr<DominatorTree> DT;
std::unique_ptr<PostDominatorTree> PDT;		std::unique_ptr<PostDominatorTree> PDT;
std::unique_ptr<LoopInfo> LI;		std::unique_ptr<LoopInfo> LI;
std::unique_ptr<SyncDependenceAnalysis> SDA;		std::unique_ptr<SyncDependenceAnalysis> SDA;
		std::unique_ptr<AssumptionCache> AC;
		std::unique_ptr<TargetTransformInfo> TTI;

DivergenceAnalysisTest() : M("", Context), TLII(), TLI(TLII) {}		DivergenceAnalysisTest() : M("", Context), TLII(), TLI(TLII) {}

DivergenceAnalysisImpl buildDA(Function &F, bool IsLCSSA) {		DivergenceAnalysisImpl buildDA(Function &F, bool IsLCSSA) {
DT.reset(new DominatorTree(F));		DT.reset(new DominatorTree(F));
PDT.reset(new PostDominatorTree(F));		PDT.reset(new PostDominatorTree(F));
LI.reset(new LoopInfo(*DT));		LI.reset(new LoopInfo(*DT));
SDA.reset(new SyncDependenceAnalysis(DT, PDT, *LI));		SDA.reset(new SyncDependenceAnalysis(DT, PDT, *LI));
return DivergenceAnalysisImpl(F, nullptr, DT, LI, *SDA, IsLCSSA);		AC.reset(new AssumptionCache(F, &*TTI));
		return DivergenceAnalysisImpl(F, nullptr, DT, LI, TTI, AC, *SDA,
		IsLCSSA);
}		}

void runWithDA(		void runWithDA(
Module &M, StringRef FuncName, bool IsLCSSA,		Module &M, StringRef FuncName, bool IsLCSSA,
function_ref<void(Function &F, LoopInfo &LI, DivergenceAnalysisImpl &DA)>		function_ref<void(Function &F, LoopInfo &LI, DivergenceAnalysisImpl &DA)>
Test) {		Test) {
auto *F = M.getFunction(FuncName);		auto *F = M.getFunction(FuncName);
ASSERT_NE(F, nullptr) << "Could not find " << FuncName;		ASSERT_NE(F, nullptr) << "Could not find " << FuncName;
▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WIP] DivergenceAnalysis: Infer uniformity from assume callsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 472197

llvm/include/llvm/Analysis/DivergenceAnalysis.h

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/AssumptionCache.cpp

llvm/lib/Analysis/DivergenceAnalysis.cpp

llvm/lib/Analysis/LegacyDivergenceAnalysis.cpp

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/assume.ll

llvm/unittests/Analysis/DivergenceAnalysisTest.cpp

[WIP] DivergenceAnalysis: Infer uniformity from assume calls
Needs ReviewPublic