This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Analysis/
-
UniformityAnalysis.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPURegBankSelect.cpp
-
AMDGPUSubtarget.h
-
AMDGPUSubtarget.cpp
2
AMDGPUTargetTransformInfo.cpp
-
test/Analysis/UniformityAnalysis/AMDGPU/
-
Analysis/
-
UniformityAnalysis/
-
AMDGPU/
-
always_uniform.ll
1/2
workitem-intrinsics.ll

Differential D151341

AMDGPU: Special case uniformity info for single lane workgroups
ClosedPublic

Authored by arsenm on May 24 2023, 8:51 AM.

Download Raw Diff

Details

Reviewers

sameerds
jhuber6
JonChesterfield
jdoerfert
yassingh

Group Reviewers

Restricted Project

Summary

Constructors/destructors and OpenMP make use of single lane groups
in some cases.

Diff Detail

Event Timeline

arsenm created this revision.May 24 2023, 8:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2023, 8:51 AM

Herald added subscribers: foad, kerbowa, hiraditya and 5 others. · View Herald Transcript

arsenm requested review of this revision.May 24 2023, 8:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2023, 8:51 AM

Herald added subscribers: jplehr, sstefan1, wdng. · View Herald Transcript

Reorder

Shouldn't this similarly apply to wave syncs and ballots?

In D151341#4368700, @jhuber6 wrote:

Shouldn't this similarly apply to wave syncs and ballots?

Ballots are forcibly uniform all the time. I guess this works for every operation and don't actually need to look for specific intrinsics

In D151341#4368704, @arsenm wrote:

Ballots are forcibly uniform all the time. I guess this works for every operation and don't actually need to look for specific intrinsics

We should be able to remove every convergent attribute under this special case, right.

Harbormaster completed remote builds in B234220: Diff 525202.May 24 2023, 9:56 AM

yassingh added inline comments.May 24 2023, 11:33 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
910–914	Will this also work for GMIR? eg "%4:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x)"
llvm/test/Analysis/UniformityAnalysis/AMDGPU/workitem-intrinsics.ll
44–50	Same as above, equivalent GMIR test?

In D151341#4368708, @jhuber6 wrote:

In D151341#4368704, @arsenm wrote:

Ballots are forcibly uniform all the time. I guess this works for every operation and don't actually need to look for specific intrinsics

We should be able to remove every convergent attribute under this special case, right.

Maybe, in the future with convergence tokens we should have a convergence reduction optimization.

I'm thinking a cleaner way to handle this would be to just have the divergence analysis early exit for single lane functions. Divergence cannot exist in the first place

This could be a top-level check in isSourceOfDivergence: if a source of divergence executes with a single lane, it de facto stops being a source of divergence.

But even beyond that: it is reasonably common to have compute shaders that do something like this:

if (threadid == 0) {
  ...
}

So it would be interesting to be able to handle this more generally in UniformityAnalysis, perhaps by having a callback that analyzes conditional branches and returns whether one side of the branch is entered by at most one lane.

Once that's the long-term goal, it seems plausible to me to make isSingleLaneExecution(Function &) a callback that is accessible to UniformityAnalysis, as opposed to querying it from isSourceOfDivergence.

In D151341#4380694, @nhaehnle wrote:
But even beyond that: it is reasonably common to have compute shaders that do something like this:
if (threadid == 0) {
  ...
}
So it would be interesting to be able to handle this more generally in UniformityAnalysis, perhaps by having a callback that analyzes conditional branches and returns whether one side of the branch is entered by at most one lane.

Once that's the long-term goal, it seems plausible to me to make isSingleLaneExecution(Function &) a callback that is accessible to UniformityAnalysis, as opposed to querying it from isSourceOfDivergence.

We have a similar check for that explicit case in the OpenMPOpt pass https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/OpenMPOpt.cpp#L2760.

In D151341#4368708, @jhuber6 wrote:

We should be able to remove every convergent attribute under this special case, right.

I'm not convinced that this is true. Consider:

uint64_t helper() {
  return ballot(true);
}

uint64_t tricky() {
  for (int i = 0; i < 64; ++i)
    if (thread == i)
      return helper();
  unreachable();
}

We can miscompile this as follows:

Mark helper as being "single lane only"
Remove convergent attributes based on that
Inline and eliminate the loop, reducing everything down to:

uint64_t tricky() {
  return ballot(true);
}

In D151341#4380716, @nhaehnle wrote:
I'm not convinced that this is true. Consider:
uint64_t helper() {
  return ballot(true);
}

uint64_t tricky() {
  for (int i = 0; i < 64; ++i)
    if (thread == i)
      return helper();
  unreachable();
}
We can miscompile this as follows:

I think this program is illegal to begin with. The constraint isn't that helper runs with a single lane, it's that it only executes a workgroup of size 1. If the loop in tricky ever went over 1 iteration, that implies execution with a larger workgroup

In D151341#4369509, @arsenm wrote:

In D151341#4368708, @jhuber6 wrote:

In D151341#4368704, @arsenm wrote:

Ballots are forcibly uniform all the time. I guess this works for every operation and don't actually need to look for specific intrinsics

We should be able to remove every convergent attribute under this special case, right.

Maybe, in the future with convergence tokens we should have a convergence reduction optimization.

I'm thinking a cleaner way to handle this would be to just have the divergence analysis early exit for single lane functions. Divergence cannot exist in the first place

That sounds right to me. Much simpler and assertive than checking for just a handful of intrinsics. Eventually, this can be an input to deciding whether to remove the convergent attribute, but that's a different task.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
910–914	The change is in TTI, which is clearly meant for LLVM IR. But from the rest of the comments, if we were to make this change in the UA instead, then we could make it work for both LLVM IR and MIR.

arsenm added a parent revision: D151987: TTI: Add function to hasBranchDivergence.Jun 2 2023, 4:24 AM

arsenm added a parent revision: D151986: UniformityAnalysis: Skip computation with no branch divergence.

Skip analysis and make everything uniform

Harbormaster completed remote builds in B236150: Diff 527816.Jun 2 2023, 6:13 AM

sameerds accepted this revision.Jun 3 2023, 3:15 AM

This revision is now accepted and ready to land.Jun 3 2023, 3:15 AM

arsenm added inline comments.Jun 16 2023, 3:51 PM

llvm/test/Analysis/UniformityAnalysis/AMDGPU/workitem-intrinsics.ll
44–50	Can't test MIR as-is, the TTI based machine uniformity query is broken as mentioned in the last patch

53fb907df4723f5267f30fe8da103f91dfb1a175

Revision Contents

Path

Size

llvm/

lib/

Analysis/

UniformityAnalysis.cpp

4 lines

Target/

AMDGPU/

AMDGPURegBankSelect.cpp

6 lines

AMDGPUSubtarget.h

3 lines

AMDGPUSubtarget.cpp

9 lines

AMDGPUTargetTransformInfo.cpp

2 lines

test/

Analysis/

UniformityAnalysis/

AMDGPU/

always_uniform.ll

7 lines

workitem-intrinsics.ll

78 lines

Diff 527816

llvm/lib/Analysis/UniformityAnalysis.cpp

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines

llvm::UniformityInfo UniformityInfoAnalysis::run(Function &F,		llvm::UniformityInfo UniformityInfoAnalysis::run(Function &F,
FunctionAnalysisManager &FAM) {		FunctionAnalysisManager &FAM) {
auto &DT = FAM.getResult<DominatorTreeAnalysis>(F);		auto &DT = FAM.getResult<DominatorTreeAnalysis>(F);
auto &TTI = FAM.getResult<TargetIRAnalysis>(F);		auto &TTI = FAM.getResult<TargetIRAnalysis>(F);
auto &CI = FAM.getResult<CycleAnalysis>(F);		auto &CI = FAM.getResult<CycleAnalysis>(F);
UniformityInfo UI{F, DT, CI, &TTI};		UniformityInfo UI{F, DT, CI, &TTI};
// Skip computation if we can assume everything is uniform.		// Skip computation if we can assume everything is uniform.
if (TTI.hasBranchDivergence())		if (TTI.hasBranchDivergence(&F))
UI.compute();		UI.compute();

return UI;		return UI;
}		}

AnalysisKey UniformityInfoAnalysis::Key;		AnalysisKey UniformityInfoAnalysis::Key;

UniformityInfoPrinterPass::UniformityInfoPrinterPass(raw_ostream &OS)		UniformityInfoPrinterPass::UniformityInfoPrinterPass(raw_ostream &OS)
Show All 38 Lines	bool UniformityInfoWrapperPass::runOnFunction(Function &F) {
auto &targetTransformInfo =		auto &targetTransformInfo =
getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

m_function = &F;		m_function = &F;
m_uniformityInfo =		m_uniformityInfo =
UniformityInfo{F, domTree, cycleInfo, &targetTransformInfo};		UniformityInfo{F, domTree, cycleInfo, &targetTransformInfo};

// Skip computation if we can assume everything is uniform.		// Skip computation if we can assume everything is uniform.
if (targetTransformInfo.hasBranchDivergence())		if (targetTransformInfo.hasBranchDivergence(m_function))
m_uniformityInfo.compute();		m_uniformityInfo.compute();

return false;		return false;
}		}

void UniformityInfoWrapperPass::print(raw_ostream &OS, const Module *) const {		void UniformityInfoWrapperPass::print(raw_ostream &OS, const Module *) const {
OS << "UniformityInfo for function '" << m_function->getName() << "':\n";		OS << "UniformityInfo for function '" << m_function->getName() << "':\n";
}		}

void UniformityInfoWrapperPass::releaseMemory() {		void UniformityInfoWrapperPass::releaseMemory() {
m_uniformityInfo = UniformityInfo{};		m_uniformityInfo = UniformityInfo{};
m_function = nullptr;		m_function = nullptr;
}		}

llvm/lib/Target/AMDGPU/AMDGPURegBankSelect.cpp

//===- AMDGPURegBankSelect.cpp ------------------------------------ C++ --==//		//===- AMDGPURegBankSelect.cpp ------------------------------------ C++ --==//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Use MachineUniformityAnalysis as the primary basis for making SGPR vs. VGPR		// Use MachineUniformityAnalysis as the primary basis for making SGPR vs. VGPR
// register bank selection. Use/def analysis as in the default RegBankSelect can		// register bank selection. Use/def analysis as in the default RegBankSelect can
// be useful in narrower circumstances (e.g. choosing AGPR vs. VGPR for gfx908).		// be useful in narrower circumstances (e.g. choosing AGPR vs. VGPR for gfx908).
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPURegBankSelect.h"		#include "AMDGPURegBankSelect.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
		#include "GCNSubtarget.h"
#include "llvm/CodeGen/MachineUniformityAnalysis.h"		#include "llvm/CodeGen/MachineUniformityAnalysis.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"

#define DEBUG_TYPE "regbankselect"		#define DEBUG_TYPE "regbankselect"

using namespace llvm;		using namespace llvm;

AMDGPURegBankSelect::AMDGPURegBankSelect(Mode RunningMode)		AMDGPURegBankSelect::AMDGPURegBankSelect(Mode RunningMode)
Show All 29 Lines	bool AMDGPURegBankSelect::runOnMachineFunction(MachineFunction &MF) {
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
Mode SaveOptMode = OptMode;		Mode SaveOptMode = OptMode;
if (F.hasOptNone())		if (F.hasOptNone())
OptMode = Mode::Fast;		OptMode = Mode::Fast;
init(MF);		init(MF);

assert(checkFunctionIsLegal(MF));		assert(checkFunctionIsLegal(MF));

		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
MachineCycleInfo &CycleInfo =		MachineCycleInfo &CycleInfo =
getAnalysis<MachineCycleInfoWrapperPass>().getCycleInfo();		getAnalysis<MachineCycleInfoWrapperPass>().getCycleInfo();
MachineDominatorTree &DomTree = getAnalysis<MachineDominatorTree>();		MachineDominatorTree &DomTree = getAnalysis<MachineDominatorTree>();

// TODO: Check for single lane execution.
MachineUniformityInfo Uniformity =		MachineUniformityInfo Uniformity =
computeMachineUniformityInfo(MF, CycleInfo, DomTree.getBase(), true);		computeMachineUniformityInfo(MF, CycleInfo, DomTree.getBase(),
		!ST.isSingleLaneExecution(F));
(void)Uniformity; // TODO: Use this		(void)Uniformity; // TODO: Use this

assignRegisterBanks(MF);		assignRegisterBanks(MF);

OptMode = SaveOptMode;		OptMode = SaveOptMode;
return false;		return false;
}		}

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	public:
/// \returns Maximum number of waves per execution unit supported by the		/// \returns Maximum number of waves per execution unit supported by the
/// subtarget without any kind of limitation.		/// subtarget without any kind of limitation.
unsigned getMaxWavesPerEU() const { return MaxWavesPerEU; }		unsigned getMaxWavesPerEU() const { return MaxWavesPerEU; }

/// Return the maximum workitem ID value in the function, for the given (0, 1,		/// Return the maximum workitem ID value in the function, for the given (0, 1,
/// 2) dimension.		/// 2) dimension.
unsigned getMaxWorkitemID(const Function &Kernel, unsigned Dimension) const;		unsigned getMaxWorkitemID(const Function &Kernel, unsigned Dimension) const;

		/// Return true if only a single workitem can be active in a wave.
		bool isSingleLaneExecution(const Function &Kernel) const;

/// Creates value range metadata on an workitemid.* intrinsic call or load.		/// Creates value range metadata on an workitemid.* intrinsic call or load.
bool makeLIDRangeMetadata(Instruction *I) const;		bool makeLIDRangeMetadata(Instruction *I) const;

/// \returns Number of bytes of arguments that are passed to a shader or		/// \returns Number of bytes of arguments that are passed to a shader or
/// kernel in addition to the explicit ones declared for the function.		/// kernel in addition to the explicit ones declared for the function.
unsigned getImplicitArgNumBytes(const Function &F) const;		unsigned getImplicitArgNumBytes(const Function &F) const;
uint64_t getExplicitKernArgSize(const Function &F, Align &MaxAlign) const;		uint64_t getExplicitKernArgSize(const Function &F, Align &MaxAlign) const;
unsigned getKernArgSegmentSize(const Function &F, Align &MaxAlign) const;		unsigned getKernArgSegmentSize(const Function &F, Align &MaxAlign) const;
Show All 11 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

	Show First 20 Lines • Show All 463 Lines • ▼ Show 20 Lines
	unsigned AMDGPUSubtarget::getMaxWorkitemID(const Function &Kernel,			unsigned AMDGPUSubtarget::getMaxWorkitemID(const Function &Kernel,
	unsigned Dimension) const {			unsigned Dimension) const {
	unsigned ReqdSize = getReqdWorkGroupSize(Kernel, Dimension);			unsigned ReqdSize = getReqdWorkGroupSize(Kernel, Dimension);
	if (ReqdSize != std::numeric_limits<unsigned>::max())			if (ReqdSize != std::numeric_limits<unsigned>::max())
	return ReqdSize - 1;			return ReqdSize - 1;
	return getFlatWorkGroupSizes(Kernel).second - 1;			return getFlatWorkGroupSizes(Kernel).second - 1;
	}			}

				bool AMDGPUSubtarget::isSingleLaneExecution(const Function &Func) const {
				for (int I = 0; I < 3; ++I) {
				if (getMaxWorkitemID(Func, I) > 0)
				return false;
				}

				return true;
				}

	bool AMDGPUSubtarget::makeLIDRangeMetadata(Instruction *I) const {			bool AMDGPUSubtarget::makeLIDRangeMetadata(Instruction *I) const {
	Function *Kernel = I->getParent()->getParent();			Function *Kernel = I->getParent()->getParent();
	unsigned MinSize = 0;			unsigned MinSize = 0;
	unsigned MaxSize = getFlatWorkGroupSizes(*Kernel).second;			unsigned MaxSize = getFlatWorkGroupSizes(*Kernel).second;
	bool IdQuery = false;			bool IdQuery = false;

	// If reqd_work_group_size is present it narrows value down.			// If reqd_work_group_size is present it narrows value down.
	if (auto *CI = dyn_cast<CallInst>(I)) {			if (auto *CI = dyn_cast<CallInst>(I)) {
	▲ Show 20 Lines • Show All 520 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	: BaseT(TM, F.getParent()->getDataLayout()),
TLI(ST->getTargetLowering()), CommonTTI(TM, F),		TLI(ST->getTargetLowering()), CommonTTI(TM, F),
IsGraphics(AMDGPU::isGraphics(F.getCallingConv())) {		IsGraphics(AMDGPU::isGraphics(F.getCallingConv())) {
SIModeRegisterDefaults Mode(F);		SIModeRegisterDefaults Mode(F);
HasFP32Denormals = Mode.allFP32Denormals();		HasFP32Denormals = Mode.allFP32Denormals();
HasFP64FP16Denormals = Mode.allFP64FP16Denormals();		HasFP64FP16Denormals = Mode.allFP64FP16Denormals();
}		}

bool GCNTTIImpl::hasBranchDivergence(const Function *F) const {		bool GCNTTIImpl::hasBranchDivergence(const Function *F) const {
return true;		return !F \|\| !ST->isSingleLaneExecution(*F);
}		}

unsigned GCNTTIImpl::getNumberOfRegisters(unsigned RCID) const {		unsigned GCNTTIImpl::getNumberOfRegisters(unsigned RCID) const {
// NB: RCID is not an RCID. In fact it is 0 or 1 for scalar or vector		// NB: RCID is not an RCID. In fact it is 0 or 1 for scalar or vector
// registers. See getRegisterClassForType for the implementation.		// registers. See getRegisterClassForType for the implementation.
// In this case vector registers are not vector in terms of		// In this case vector registers are not vector in terms of
// VGPRs, but those which can hold multiple values.		// VGPRs, but those which can hold multiple values.

▲ Show 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	bool GCNTTIImpl::isSourceOfDivergence(const Value *V) const {

// Atomics are divergent because they are executed sequentially: when an		// Atomics are divergent because they are executed sequentially: when an
// atomic operation refers to the same address in each thread, then each		// atomic operation refers to the same address in each thread, then each
// thread after the first sees the value written by the previous thread as		// thread after the first sees the value written by the previous thread as
// original value.		// original value.
if (isa<AtomicRMWInst>(V) \|\| isa<AtomicCmpXchgInst>(V))		if (isa<AtomicRMWInst>(V) \|\| isa<AtomicCmpXchgInst>(V))
return true;		return true;

if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V)) {		if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V)) {
if (Intrinsic->getIntrinsicID() == Intrinsic::read_register)		if (Intrinsic->getIntrinsicID() == Intrinsic::read_register)
return isReadRegisterSourceOfDivergence(Intrinsic);		return isReadRegisterSourceOfDivergence(Intrinsic);

return AMDGPU::isIntrinsicSourceOfDivergence(Intrinsic->getIntrinsicID());		return AMDGPU::isIntrinsicSourceOfDivergence(Intrinsic->getIntrinsicID());
		yassinghUnsubmitted Not Done Reply Inline Actions Will this also work for GMIR? eg "%4:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.x)" yassingh: Will this also work for GMIR? eg "//%4:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.workitem.id.
		sameerdsUnsubmitted Not Done Reply Inline Actions The change is in TTI, which is clearly meant for LLVM IR. But from the rest of the comments, if we were to make this change in the UA instead, then we could make it work for both LLVM IR and MIR. sameerds: The change is in TTI, which is clearly meant for LLVM IR. But from the rest of the comments, if…
}		}

// Assume all function calls are a source of divergence.		// Assume all function calls are a source of divergence.
if (const CallInst *CI = dyn_cast<CallInst>(V)) {		if (const CallInst *CI = dyn_cast<CallInst>(V)) {
if (CI->isInlineAsm())		if (CI->isInlineAsm())
return isInlineAsmSourceOfDivergence(CI);		return isInlineAsmSourceOfDivergence(CI);
return true;		return true;
}		}
▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

llvm/test/Analysis/UniformityAnalysis/AMDGPU/always_uniform.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	define void @asm_mixed_sgpr_vgpr(i32 %divergent) {
%asm = call { i32, i32 } asm "; def $0, $1, $2","=s,=v,v"(i32 %divergent)		%asm = call { i32, i32 } asm "; def $0, $1, $2","=s,=v,v"(i32 %divergent)
%sgpr = extractvalue { i32, i32 } %asm, 0		%sgpr = extractvalue { i32, i32 } %asm, 0
%vgpr = extractvalue { i32, i32 } %asm, 1		%vgpr = extractvalue { i32, i32 } %asm, 1
store i32 %sgpr, ptr addrspace(1) undef		store i32 %sgpr, ptr addrspace(1) undef
store i32 %vgpr, ptr addrspace(1) undef		store i32 %vgpr, ptr addrspace(1) undef
ret void		ret void
}		}

		; CHECK-LABEL: for function 'single_lane_func_arguments':
		; CHECK-NOT: DIVERGENT
		define void @single_lane_func_arguments(i32 %i32, i1 %i1) #2 {
		ret void
		}

declare i32 @llvm.amdgcn.workitem.id.x() #0		declare i32 @llvm.amdgcn.workitem.id.x() #0
declare i32 @llvm.amdgcn.readfirstlane(i32) #0		declare i32 @llvm.amdgcn.readfirstlane(i32) #0
declare i64 @llvm.amdgcn.icmp.i32(i32, i32, i32) #1		declare i64 @llvm.amdgcn.icmp.i32(i32, i32, i32) #1
declare i64 @llvm.amdgcn.fcmp.i32(float, float, i32) #1		declare i64 @llvm.amdgcn.fcmp.i32(float, float, i32) #1
declare i64 @llvm.amdgcn.ballot.i32(i1) #1		declare i64 @llvm.amdgcn.ballot.i32(i1) #1

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }
attributes #1 = { nounwind readnone convergent }		attributes #1 = { nounwind readnone convergent }
		attributes #2 = { "amdgpu-flat-work-group-size"="1,1" }

llvm/test/Analysis/UniformityAnalysis/AMDGPU/workitem-intrinsics.ll

	Show All 35 Lines

	; CHECK: DIVERGENT: %mbcnt.hi = call i32 @llvm.amdgcn.mbcnt.hi(i32 0, i32 0)			; CHECK: DIVERGENT: %mbcnt.hi = call i32 @llvm.amdgcn.mbcnt.hi(i32 0, i32 0)
	define amdgpu_kernel void @mbcnt_hi() #1 {			define amdgpu_kernel void @mbcnt_hi() #1 {
	%mbcnt.hi = call i32 @llvm.amdgcn.mbcnt.hi(i32 0, i32 0)			%mbcnt.hi = call i32 @llvm.amdgcn.mbcnt.hi(i32 0, i32 0)
	store volatile i32 %mbcnt.hi, ptr addrspace(1) undef			store volatile i32 %mbcnt.hi, ptr addrspace(1) undef
	ret void			ret void
	}			}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_x_singlethreaded':
				; CHECK-NOT: DIVERGENT
				define amdgpu_kernel void @workitem_id_x_singlethreaded() #2 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.x()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}
				yassinghUnsubmitted Not Done Reply Inline Actions Same as above, equivalent GMIR test? yassingh: Same as above, equivalent GMIR test?
				arsenmAuthorUnsubmitted Done Reply Inline Actions Can't test MIR as-is, the TTI based machine uniformity query is broken as mentioned in the last patch arsenm: Can't test MIR as-is, the TTI based machine uniformity query is broken as mentioned in the last…

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_y_singlethreaded':
				; CHECK-NOT: DIVERGENT
				define amdgpu_kernel void @workitem_id_y_singlethreaded() #2 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.y()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_z_singlethreaded':
				; CHECK-NOT: DIVERGENT
				define amdgpu_kernel void @workitem_id_z_singlethreaded() #2 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.y()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_x_singlethreaded_md':
				; CHECK-NOT: DIVERGENT
				define amdgpu_kernel void @workitem_id_x_singlethreaded_md() !reqd_work_group_size !0 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.x()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_y_singlethreaded_md':
				; CHECK-NOT: DIVERGENT
				define amdgpu_kernel void @workitem_id_y_singlethreaded_md() !reqd_work_group_size !0 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.y()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_z_singlethreaded_md':
				; CHECK-NOT: DIVERGENT
				define amdgpu_kernel void @workitem_id_z_singlethreaded_md() !reqd_work_group_size !0 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.y()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_x_not_singlethreaded_dimx':
				; CHECK: DIVERGENT: %id.x = call i32 @llvm.amdgcn.workitem.id.x()
				define amdgpu_kernel void @workitem_id_x_not_singlethreaded_dimx() !reqd_work_group_size !1 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.x()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_x_not_singlethreaded_dimy':
				; CHECK: DIVERGENT: %id.x = call i32 @llvm.amdgcn.workitem.id.x()
				define amdgpu_kernel void @workitem_id_x_not_singlethreaded_dimy() !reqd_work_group_size !2 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.x()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

				; CHECK-LABEL: UniformityInfo for function 'workitem_id_x_not_singlethreaded_dimz':
				; CHECK: DIVERGENT: %id.x = call i32 @llvm.amdgcn.workitem.id.x()
				define amdgpu_kernel void @workitem_id_x_not_singlethreaded_dimz() !reqd_work_group_size !3 {
				%id.x = call i32 @llvm.amdgcn.workitem.id.x()
				store volatile i32 %id.x, ptr addrspace(1) undef
				ret void
				}

	attributes #0 = { nounwind readnone }			attributes #0 = { nounwind readnone }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
				attributes #2 = { "amdgpu-flat-work-group-size"="1,1" }

				!0 = !{i32 1, i32 1, i32 1}
				!1 = !{i32 2, i32 1, i32 1}
				!2 = !{i32 1, i32 2, i32 1}
				!3 = !{i32 1, i32 1, i32 2}