This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAMDGPU.td
-
lib/
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUAtomicOptimizer.cpp
1/1
AMDGPUTargetTransformInfo.cpp
3/13
SIISelLowering.cpp
-
Transforms/InstCombine/
-
InstCombine/
1/5
InstCombineCalls.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
atomic_optimizations_buffer.ll
-
atomic_optimizations_global_pointer.ll
-
atomic_optimizations_local_pointer.ll
-
atomic_optimizations_pixelshader.ll
-
atomic_optimizations_raw_buffer.ll
-
atomic_optimizations_struct_buffer.ll
-
llvm.amdgcn.ballot.i32.ll
-
llvm.amdgcn.ballot.i64.ll
-
Transforms/InstCombine/AMDGPU/
-
InstCombine/
-
AMDGPU/
1
amdgcn-intrinsics.ll

Differential D65088

[AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic
ClosedPublic

Authored by Flakebi on Jul 22 2019, 6:34 AM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle
tpr
dstuttard
foad

Commits

rG5d3a69feca12: [AMDGPU] New llvm.amdgcn.ballot intrinsic

Summary

Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

I'm not going to commit this as-is because tests are failing due to
poor code generation, e.g. test2 in ballot.ll generates:

v_cmp_eq_u32_e32 vcc, v0, v1
v_cndmask_b32_e64 v0, 0, 1, vcc
v_cmp_ne_u32_e64 s[4:5], 0, v0
v_mov_b32_e32 v0, s4
v_mov_b32_e32 v1, s5

instead of:

v_cmp_eq_u32_e32 s[4:5], v0, v1
v_mov_b32_e32 v0, s4
v_mov_b32_e32 v1, s5

I'd appreciate feedback on (a) the idea, (b) the implementation and
(c) how best to improve the code generation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Jul 22 2019, 6:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2019, 6:34 AM

Herald added subscribers: jfb, hiraditya, t-tye and 4 others. · View Herald Transcript

Harbormaster completed remote builds in B35457: Diff 211078.Jul 22 2019, 6:38 AM

arsenm added inline comments.Jul 22 2019, 6:44 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4235–4236	Should also do this in instcombine, as icmp/fcmp already do
4250	This is already the right type, so this should be unnecessary
llvm/test/CodeGen/AMDGPU/ballot.ll
4 ↗	(On Diff #211078)	This needs the .i64 for the mangling, I'm surprised this works
49 ↗	(On Diff #211078)	Can you add some different imp and fcmp sources? Also need a separate wave32 test

Address review comments.

Harbormaster completed remote builds in B35462: Diff 211100.Jul 22 2019, 7:45 AM

foad marked 2 inline comments as done.Jul 22 2019, 7:46 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4235–4236	I've had a go, but is it really necessary to implement the same optimization cases in two places (especially the one that just replaces a ballot intrinsic with a read_register intrinsic), and how do I test them independently?
4250	Does anything guarantee that it's the right type? Is it OK for the compiler to crash if someone writes a .ll file that uses it with the wrong type?

Thanks for doing this. For the codegen quality question, I wonder if something like the following could be done:

Remove what's currently in SITargetLowering.
Add a custom DAG combine which combines (ballot (ISD::SETCC ...)) to (AMDGPUISD::SETCC ...)
Do code generation for the remaining ballot cases either in AMDGPUISelDAGToDAG or via a TableGen pattern (the most generic case would have to map to some combination of COPY_TO_REGCLASS and S_AND_B32/B64).

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4235–4236	Independent tests as opt tests should go into test/Transforms/InstCombine. You can find existing tests for the icmp/fcmp intrinsics there. Nitpick: please add braces for the if-body (multiple lines guarded by if).

arsenm added inline comments.Jul 24 2019, 5:54 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4235–4236	Yes. If I were to pick one, it would probably be instcombine. I think it’s unlikely this would come up in a context only visible after lowering. Reducing intrinsics is useful when reducing testcases, and any code in the DAG is just going to need to be redone in globalisel eventually
4250	Ideally we would verify wave32 IR somewhere and error, but I don’t think that exists today

Any update on this?

In D65088#1817656, @arsenm wrote:

Any update on this?

No, I haven't done any more work on it. I guess I could pick it up again some time soon, unless anyone else wants to?

In D65088#1817743, @foad wrote:

In D65088#1817656, @arsenm wrote:

Any update on this?

No, I haven't done any more work on it. I guess I could pick it up again some time soon, unless anyone else wants to?

So far I've put off handling amdgcn.icmp/fcmp in globalisel, hoping this would supersede that so I'd like to see this move along

Herald added a subscriber: kerbowa. · View Herald TranscriptFeb 13 2020, 3:26 PM

Agreed, it would be good to see progress on this.

Flakebi mentioned this in D75855: [AMDGPU] Use script to generate atomic optimizations test.Mar 9 2020, 9:31 AM

Flakebi commandeered this revision.Mar 11 2020, 12:55 AM

Flakebi added a reviewer: foad.

Address Nicolai’s comments and implement this as DAG combines and TableGen patterns.

The code generation for test2 is currently not optimal:

%trunc = trunc i32 %x to i1
%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)

generates

v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 vcc, 1, v0
s_and_b64 s[4:5], vcc, exec

where the first compare stems from the truncate.
We could handle the case of (ballot (truncate x)) in the combining. Any opinions on this?

Flakebi added parent revisions: D75855: [AMDGPU] Use script to generate atomic optimizations test, D75857: [AMDGPU] Fix using physical registers in vector instructions.Mar 11 2020, 1:17 AM

Flakebi added a child revision: D75976: [AMDGPU] Optimize AtomicOptimizer.Mar 11 2020, 1:47 AM

In D65088#1916351, @Flakebi wrote:
The code generation for test2 is currently not optimal:
%trunc = trunc i32 %x to i1
%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
generates
v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 vcc, 1, v0
s_and_b64 s[4:5], vcc, exec
where the first compare stems from the truncate.

I'm confused by this. What is the optimal code generation?

Harbormaster failed remote builds in B48786: Diff 249563!Mar 11 2020, 2:19 AM

sebastian-ne mentioned this in rG2f857eadf5d4: [AMDGPU] Use script to generate atomic optimizations test.Mar 11 2020, 2:19 AM

foad added inline comments.Mar 11 2020, 2:30 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
428 ↗	(On Diff #249563)	If $src is the result of a v_cmp instruction then this s_and will be redundant, won't it? Because v_cmp is defined to return a zero in vcc for all inactive lanes.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3968	Are you sure it's safe to use read_register(exec) for this? I'm not an expert but I have been told it's not safe, because the compiler doesn't understand that it can't reorder the read_register call past other instructions that modify exec.

Flakebi marked an inline comment as done.Mar 11 2020, 3:09 AM

Flakebi added inline comments.

llvm/lib/Target/AMDGPU/SIInstructions.td
428 ↗	(On Diff #249563)	Yes, I agree that this is redundant. As this happens in the lowering phase, we can only match for `(i64 (int_amdgcn_ballot (i1 (trunc $src))))` in this case. I think we would need to combine machine instructions to get the `v_cmp`.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3968	Is there a safe way to get the exec register? I’m fine with removing this optimization here and doing it only in the SDag.

In D65088#1916416, @foad wrote:
In D65088#1916351, @Flakebi wrote:
The code generation for test2 is currently not optimal:
%trunc = trunc i32 %x to i1
%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
generates
v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 vcc, 1, v0
s_and_b64 s[4:5], vcc, exec
where the first compare stems from the truncate.
I'm confused by this. What is the optimal code generation?

The more optimal version would be to merge the compare and s_and exec like you said in your comment?

v_and_b32_e32 v0, 1, v0
v_cmp_eq_u32_e32 s[4:5], 1, v0

Flakebi marked an inline comment as done.Mar 12 2020, 2:11 AM

Flakebi added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3968	Actually, icmp and fcmp already create `read_register(exec)` in InstCombine. So, having this combination and replacing icmp with ballot won’t change behavior.

Use getCopyFromReg(exec) and rebase on fixed whole-wave-mode.

The remaining exec copy can get fixed in multiple ways:

Prevent sinking the ctpop into the next basic block. Unfortunately marking the instruction as convergent does not work because the Machine Sinking pass moves it instead.
Reuse the copy of exec that gets created by s_and_saveexec. This might work with GlobalISel, so it’s probably worth to wait until this gets used.

Flakebi mentioned this in D75976: [AMDGPU] Optimize AtomicOptimizer.Mar 18 2020, 3:06 AM

Harbormaster completed remote builds in B49570: Diff 251027.Mar 18 2020, 3:46 AM

In D65088#1928542, @Flakebi wrote:

Use getCopyFromReg(exec) and rebase on fixed whole-wave-mode.

The remaining exec copy can get fixed in multiple ways:

Prevent sinking the ctpop into the next basic block. Unfortunately marking the instruction as convergent does not work because the Machine Sinking pass moves it instead.

Reuse the copy of exec that gets created by s_and_saveexec. This might work with GlobalISel, so it’s probably worth to wait until this gets used.

I was afraid this would be a problem. We probably need a version of COPY with convergent set. If MachineSinking is moving a convergent instruction, that's also a bug

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8705	I didn't think of it before, but reading exec this way could potentially be dangerous due to the fact that we have the terrible operations that modify exec in the middle of IR blocks, and we split them later. We might have to do this fold later
llvm/lib/Target/AMDGPU/SIInstructions.td
428 ↗	(On Diff #251027)	Is the COPY_TO_REGCLASS just a tablegen workaround?
433 ↗	(On Diff #251027)	Probably need an explicit Src_b32:$src to make this work with GlobalISel
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3969	What happens for wave32 here?
llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
2412	Wave32 tests also

Flakebi marked an inline comment as done.Mar 18 2020, 9:41 AM

Flakebi added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8705	What are these operations and how can I fix them? And shouldn’t this work as the instruction reads exec and thus should not be touched?
llvm/lib/Target/AMDGPU/SIInstructions.td
428 ↗	(On Diff #251027)	Yes, we get an i1 as input though we know that it is stored an sgpr (pair), so we "cast" it into one. The same happens in line 803 to optimize icmp: def : Pat < (i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))), (COPY $src) // Return the SGPRs representing i1 src >;

Add missing wave32 instcombining

Also should add GlobalISel tests

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8705	You can't really fix them in SelectionDAG. Things like the wwm/wqm intrinsics, and kills can exist in the middle of blocks (plus inline asm)

Harbormaster completed remote builds in B49616: Diff 251121.Mar 18 2020, 10:21 AM

Flakebi updated this revision to Diff 251586.Mar 20 2020, 3:03 AM

This comment was removed by Flakebi.

I removed the COPY_TO_REGCLASS, it looked flaky and does not work with GlobalISel.
Instead, the ballot intrinsic is morphed into an AMDGPUISD::SETCC. Compares are the only reasonable way to get a boolean value into the wavefront form as an i32/i64 and use it in LLVM.

Matt: How do I add these combines for GlobalISel?
Should they go into AMDGPUCombine.td? And do I have to write functions for this or does it work purely with pattern matching?

Harbormaster completed remote builds in B49857: Diff 251587.Mar 20 2020, 3:46 AM

Harbormaster completed remote builds in B49856: Diff 251586.

In D65088#1933161, @Flakebi wrote:

I removed the COPY_TO_REGCLASS, it looked flaky and does not work with GlobalISel.
Instead, the ballot intrinsic is morphed into an AMDGPUISD::SETCC. Compares are the only reasonable way to get a boolean value into the wavefront form as an i32/i64 and use it in LLVM.

Matt: How do I add these combines for GlobalISel?
Should they go into AMDGPUCombine.td? And do I have to write functions for this or does it work purely with pattern matching?

If you're not going to have a fallback pattern in tablegen, then the GlobalISel support is probably a separate patch. This would not go in AMDGPUCombines.td as this is not an optimization combine. By having everything go through AMDGPUISD::SETCC, this develops the same problem the existing icmp/fcmp intrinsics have for GlobalISel support. As a separate patch, we need to replace how AMDGPUISD::SETCC works to avoid relying on the CondCode fiel

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8714–8716	Since you're now treating this as the full lowering path, and not an optimization for the special case with a compare, this should go in LowerINTRINSIC_WO_CHAIN
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3967–3968	Should move the getDeclaration below to where you actually use it. If you happened to hit the other size break case, you would have introduced a dead declaration

Move the code to lowering again, I’m back were Jay started.
Report a fatal error if the size is neither i32 nor i64.

Harbormaster completed remote builds in B50128: Diff 252064.Mar 23 2020, 9:16 AM

return instead of report_fatal_error

Harbormaster failed remote builds in B50948: Diff 253569!Mar 30 2020, 6:27 AM

arsenm added inline comments.Mar 30 2020, 7:54 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
737	Can you add a test for this in test/Analysis/DivergenceAnalysis/AMDGPU
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4258–4260	DAG.getSetCC?

Add uniformity test

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4258–4260	This creates `AMDGPUISD::SETCC` not setcc :)

Harbormaster failed remote builds in B50980: Diff 253629!Mar 30 2020, 10:50 AM

arsenm accepted this revision.Mar 30 2020, 11:40 AM

This revision is now accepted and ready to land.Mar 30 2020, 11:40 AM

Closed by commit rG5d3a69feca12: [AMDGPU] New llvm.amdgcn.ballot intrinsic (authored by sebastian-ne). · Explain WhyMar 31 2020, 1:37 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

3 lines

lib/

Target/

AMDGPU/

AMDGPUAtomicOptimizer.cpp

5 lines

AMDGPUTargetTransformInfo.cpp

1 line

SIISelLowering.cpp

39 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

31 lines

test/

CodeGen/

AMDGPU/

atomic_optimizations_buffer.ll

22 lines

atomic_optimizations_global_pointer.ll

50 lines

atomic_optimizations_local_pointer.ll

627 lines

atomic_optimizations_pixelshader.ll

34 lines

atomic_optimizations_raw_buffer.ll

22 lines

atomic_optimizations_struct_buffer.ll

22 lines

llvm.amdgcn.ballot.i32.ll

93 lines

llvm.amdgcn.ballot.i64.ll

88 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

59 lines

Diff 253569

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,342 Lines • ▼ Show 20 Lines
	def int_amdgcn_icmp :			def int_amdgcn_icmp :
	Intrinsic<[llvm_anyint_ty], [llvm_anyint_ty, LLVMMatchType<1>, llvm_i32_ty],			Intrinsic<[llvm_anyint_ty], [llvm_anyint_ty, LLVMMatchType<1>, llvm_i32_ty],
	[IntrNoMem, IntrConvergent, ImmArg<2>]>;			[IntrNoMem, IntrConvergent, ImmArg<2>]>;

	def int_amdgcn_fcmp :			def int_amdgcn_fcmp :
	Intrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty, LLVMMatchType<1>, llvm_i32_ty],			Intrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty, LLVMMatchType<1>, llvm_i32_ty],
	[IntrNoMem, IntrConvergent, ImmArg<2>]>;			[IntrNoMem, IntrConvergent, ImmArg<2>]>;

				def int_amdgcn_ballot :
				Intrinsic<[llvm_anyint_ty], [llvm_i1_ty], [IntrNoMem, IntrConvergent]>;

	def int_amdgcn_readfirstlane :			def int_amdgcn_readfirstlane :
	GCCBuiltin<"__builtin_amdgcn_readfirstlane">,			GCCBuiltin<"__builtin_amdgcn_readfirstlane">,
	Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem, IntrConvergent]>;			Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

	// The lane argument must be uniform across the currently active threads of the			// The lane argument must be uniform across the currently active threads of the
	// current wave. Otherwise, the result is undefined.			// current wave. Otherwise, the result is undefined.
	def int_amdgcn_readlane :			def int_amdgcn_readlane :
	GCCBuiltin<"__builtin_amdgcn_readlane">,			GCCBuiltin<"__builtin_amdgcn_readlane">,
	▲ Show 20 Lines • Show All 520 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	void AMDGPUAtomicOptimizer::optimizeAtomic(Instruction &I,

// This is the value in the atomic operation we need to combine in order to		// This is the value in the atomic operation we need to combine in order to
// reduce the number of atomic operations.		// reduce the number of atomic operations.
Value *const V = I.getOperand(ValIdx);		Value *const V = I.getOperand(ValIdx);

// We need to know how many lanes are active within the wavefront, and we do		// We need to know how many lanes are active within the wavefront, and we do
// this by doing a ballot of active lanes.		// this by doing a ballot of active lanes.
Type *const WaveTy = B.getIntNTy(ST->getWavefrontSize());		Type *const WaveTy = B.getIntNTy(ST->getWavefrontSize());
CallInst *const Ballot = B.CreateIntrinsic(		CallInst *const Ballot =
Intrinsic::amdgcn_icmp, {WaveTy, B.getInt32Ty()},		B.CreateIntrinsic(Intrinsic::amdgcn_ballot, WaveTy, B.getTrue());
{B.getInt32(1), B.getInt32(0), B.getInt32(CmpInst::ICMP_NE)});

// We need to know how many lanes are active within the wavefront that are		// We need to know how many lanes are active within the wavefront that are
// below us. If we counted each lane linearly starting from 0, a lane is		// below us. If we counted each lane linearly starting from 0, a lane is
// below us only if its associated index was less than ours. We do this by		// below us only if its associated index was less than ours. We do this by
// using the mbcnt intrinsic.		// using the mbcnt intrinsic.
Value *Mbcnt;		Value *Mbcnt;
if (ST->isWave32()) {		if (ST->isWave32()) {
Mbcnt = B.CreateIntrinsic(Intrinsic::amdgcn_mbcnt_lo, {},		Mbcnt = B.CreateIntrinsic(Intrinsic::amdgcn_mbcnt_lo, {},
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 728 Lines • ▼ Show 20 Lines	bool GCNTTIImpl::isAlwaysUniform(const Value *V) const {
if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V)) {		if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V)) {
switch (Intrinsic->getIntrinsicID()) {		switch (Intrinsic->getIntrinsicID()) {
default:		default:
return false;		return false;
case Intrinsic::amdgcn_readfirstlane:		case Intrinsic::amdgcn_readfirstlane:
case Intrinsic::amdgcn_readlane:		case Intrinsic::amdgcn_readlane:
case Intrinsic::amdgcn_icmp:		case Intrinsic::amdgcn_icmp:
case Intrinsic::amdgcn_fcmp:		case Intrinsic::amdgcn_fcmp:
		case Intrinsic::amdgcn_ballot:
		arsenmUnsubmitted Done Reply Inline Actions Can you add a test for this in test/Analysis/DivergenceAnalysis/AMDGPU arsenm: Can you add a test for this in test/Analysis/DivergenceAnalysis/AMDGPU
case Intrinsic::amdgcn_if_break:		case Intrinsic::amdgcn_if_break:
return true;		return true;
}		}
}		}

if (const CallInst *CI = dyn_cast<CallInst>(V)) {		if (const CallInst *CI = dyn_cast<CallInst>(V)) {
if (isa<InlineAsm>(CI->getCalledValue()))		if (isa<InlineAsm>(CI->getCalledValue()))
return !isInlineAsmSourceOfDivergence(CI);		return !isInlineAsmSourceOfDivergence(CI);
▲ Show 20 Lines • Show All 353 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,217 Lines • ▼ Show 20 Lines	static SDValue lowerFCMPIntrinsic(const SITargetLowering &TLI,
EVT CCVT = EVT::getIntegerVT(*DAG.getContext(), WavefrontSize);		EVT CCVT = EVT::getIntegerVT(*DAG.getContext(), WavefrontSize);
SDValue SetCC = DAG.getNode(AMDGPUISD::SETCC, SL, CCVT, Src0,		SDValue SetCC = DAG.getNode(AMDGPUISD::SETCC, SL, CCVT, Src0,
Src1, DAG.getCondCode(CCOpcode));		Src1, DAG.getCondCode(CCOpcode));
if (VT.bitsEq(CCVT))		if (VT.bitsEq(CCVT))
return SetCC;		return SetCC;
return DAG.getZExtOrTrunc(SetCC, SL, VT);		return DAG.getZExtOrTrunc(SetCC, SL, VT);
}		}

		static SDValue lowerBALLOTIntrinsic(const SITargetLowering &TLI, SDNode *N,
		SelectionDAG &DAG) {
		EVT VT = N->getValueType(0);
		SDValue Src = N->getOperand(1);
		SDLoc SL(N);

		if (Src.getOpcode() == ISD::SETCC) {
		// (ballot (ISD::SETCC ...)) -> (AMDGPUISD::SETCC ...)
		return DAG.getNode(AMDGPUISD::SETCC, SL, VT, Src.getOperand(0),
		Src.getOperand(1), Src.getOperand(2));
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Should also do this in instcombine, as icmp/fcmp already do arsenm: Should also do this in instcombine, as icmp/fcmp already do
		foadUnsubmitted Done Reply Inline Actions I've had a go, but is it really necessary to implement the same optimization cases in two places (especially the one that just replaces a ballot intrinsic with a read_register intrinsic), and how do I test them independently? foad: I've had a go, but is it really necessary to implement the same optimization cases in two…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Independent tests as opt tests should go into test/Transforms/InstCombine. You can find existing tests for the icmp/fcmp intrinsics there. Nitpick: please add braces for the if-body (multiple lines guarded by if). nhaehnle: Independent tests as opt tests should go into test/Transforms/InstCombine. You can find…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes. If I were to pick one, it would probably be instcombine. I think it’s unlikely this would come up in a context only visible after lowering. Reducing intrinsics is useful when reducing testcases, and any code in the DAG is just going to need to be redone in globalisel eventually arsenm: Yes. If I were to pick one, it would probably be instcombine. I think it’s unlikely this would…
		if (const ConstantSDNode *Arg = dyn_cast<ConstantSDNode>(Src)) {
		// (ballot 0) -> 0
		if (Arg->isNullValue())
		return DAG.getConstant(0, SL, VT);

		// (ballot 1) -> EXEC/EXEC_LO
		if (Arg->isOne()) {
		Register Exec;
		if (VT.getScalarSizeInBits() == 32)
		Exec = AMDGPU::EXEC_LO;
		else if (VT.getScalarSizeInBits() == 64)
		Exec = AMDGPU::EXEC;
		else
		return SDValue();
		arsenmUnsubmitted Not Done Reply Inline Actions This is already the right type, so this should be unnecessary arsenm: This is already the right type, so this should be unnecessary
		foadUnsubmitted Done Reply Inline Actions Does anything guarantee that it's the right type? Is it OK for the compiler to crash if someone writes a .ll file that uses it with the wrong type? foad: Does anything guarantee that it's the right type? Is it OK for the compiler to crash if someone…
		arsenmUnsubmitted Not Done Reply Inline Actions Ideally we would verify wave32 IR somewhere and error, but I don’t think that exists today arsenm: Ideally we would verify wave32 IR somewhere and error, but I don’t think that exists today

		return DAG.getCopyFromReg(DAG.getEntryNode(), SL, Exec, VT);
		}
		}

		// (ballot (i1 $src)) -> (AMDGPUISD::SETCC (i32 (zext $src)) (i32 0)
		// ISD::SETNE)
		return DAG.getNode(
		AMDGPUISD::SETCC, SL, VT, DAG.getZExtOrTrunc(Src, SL, MVT::i32),
		DAG.getConstant(0, SL, MVT::i32), DAG.getCondCode(ISD::SETNE));
		arsenmUnsubmitted Not Done Reply Inline Actions DAG.getSetCC? arsenm: DAG.getSetCC?
		FlakebiAuthorUnsubmitted Done Reply Inline Actions This creates `AMDGPUISD::SETCC` not setcc :) Flakebi: This creates `AMDGPUISD::SETCC` not setcc :)
		}

void SITargetLowering::ReplaceNodeResults(SDNode *N,		void SITargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
case ISD::INSERT_VECTOR_ELT: {		case ISD::INSERT_VECTOR_ELT: {
if (SDValue Res = lowerINSERT_VECTOR_ELT(SDValue(N, 0), DAG))		if (SDValue Res = lowerINSERT_VECTOR_ELT(SDValue(N, 0), DAG))
Results.push_back(Res);		Results.push_back(Res);
return;		return;
▲ Show 20 Lines • Show All 1,702 Lines • ▼ Show 20 Lines	if (Op.getOperand(1).getValueType() == MVT::i1 &&
Op.getConstantOperandVal(2) == 0 &&		Op.getConstantOperandVal(2) == 0 &&
Op.getConstantOperandVal(3) == ICmpInst::Predicate::ICMP_NE)		Op.getConstantOperandVal(3) == ICmpInst::Predicate::ICMP_NE)
return Op;		return Op;
return lowerICMPIntrinsic(*this, Op.getNode(), DAG);		return lowerICMPIntrinsic(*this, Op.getNode(), DAG);
}		}
case Intrinsic::amdgcn_fcmp: {		case Intrinsic::amdgcn_fcmp: {
return lowerFCMPIntrinsic(*this, Op.getNode(), DAG);		return lowerFCMPIntrinsic(*this, Op.getNode(), DAG);
}		}
		case Intrinsic::amdgcn_ballot:
		return lowerBALLOTIntrinsic(*this, Op.getNode(), DAG);
case Intrinsic::amdgcn_fmed3:		case Intrinsic::amdgcn_fmed3:
return DAG.getNode(AMDGPUISD::FMED3, DL, VT,		return DAG.getNode(AMDGPUISD::FMED3, DL, VT,
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));		Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
case Intrinsic::amdgcn_fdot2:		case Intrinsic::amdgcn_fdot2:
return DAG.getNode(AMDGPUISD::FDOT2, DL, VT,		return DAG.getNode(AMDGPUISD::FDOT2, DL, VT,
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3),		Op.getOperand(1), Op.getOperand(2), Op.getOperand(3),
Op.getOperand(4));		Op.getOperand(4));
case Intrinsic::amdgcn_fmul_legacy:		case Intrinsic::amdgcn_fmul_legacy:
▲ Show 20 Lines • Show All 2,706 Lines • ▼ Show 20 Lines	bool SITargetLowering::isCanonicalized(SelectionDAG &DAG, SDValue Op,

switch (Opcode) {		switch (Opcode) {
// These will flush denorms if required.		// These will flush denorms if required.
case ISD::FADD:		case ISD::FADD:
case ISD::FSUB:		case ISD::FSUB:
case ISD::FMUL:		case ISD::FMUL:
case ISD::FCEIL:		case ISD::FCEIL:
case ISD::FFLOOR:		case ISD::FFLOOR:
case ISD::FMA:		case ISD::FMA:
		arsenmUnsubmitted Not Done Reply Inline Actions I didn't think of it before, but reading exec this way could potentially be dangerous due to the fact that we have the terrible operations that modify exec in the middle of IR blocks, and we split them later. We might have to do this fold later arsenm: I didn't think of it before, but reading exec this way could potentially be dangerous due to…
		FlakebiAuthorUnsubmitted Not Done Reply Inline Actions What are these operations and how can I fix them? And shouldn’t this work as the instruction reads exec and thus should not be touched? Flakebi: What are these operations and how can I fix them? And shouldn’t this work as the instruction…
		arsenmUnsubmitted Not Done Reply Inline Actions You can't really fix them in SelectionDAG. Things like the wwm/wqm intrinsics, and kills can exist in the middle of blocks (plus inline asm) arsenm: You can't really fix them in SelectionDAG. Things like the wwm/wqm intrinsics, and kills can…
case ISD::FMAD:		case ISD::FMAD:
case ISD::FSQRT:		case ISD::FSQRT:
case ISD::FDIV:		case ISD::FDIV:
case ISD::FREM:		case ISD::FREM:
case ISD::FP_ROUND:		case ISD::FP_ROUND:
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
case AMDGPUISD::FMUL_LEGACY:		case AMDGPUISD::FMUL_LEGACY:
case AMDGPUISD::FMAD_FTZ:		case AMDGPUISD::FMAD_FTZ:
case AMDGPUISD::RCP:		case AMDGPUISD::RCP:
case AMDGPUISD::RSQ:		case AMDGPUISD::RSQ:
case AMDGPUISD::RSQ_CLAMP:		case AMDGPUISD::RSQ_CLAMP:
		arsenmUnsubmitted Not Done Reply Inline Actions Since you're now treating this as the full lowering path, and not an optimization for the special case with a compare, this should go in LowerINTRINSIC_WO_CHAIN arsenm: Since you're now treating this as the full lowering path, and not an optimization for the…
case AMDGPUISD::RCP_LEGACY:		case AMDGPUISD::RCP_LEGACY:
case AMDGPUISD::RSQ_LEGACY:		case AMDGPUISD::RSQ_LEGACY:
case AMDGPUISD::RCP_IFLAG:		case AMDGPUISD::RCP_IFLAG:
case AMDGPUISD::TRIG_PREOP:		case AMDGPUISD::TRIG_PREOP:
case AMDGPUISD::DIV_SCALE:		case AMDGPUISD::DIV_SCALE:
case AMDGPUISD::DIV_FMAS:		case AMDGPUISD::DIV_FMAS:
case AMDGPUISD::DIV_FIXUP:		case AMDGPUISD::DIV_FIXUP:
case AMDGPUISD::FRACT:		case AMDGPUISD::FRACT:
▲ Show 20 Lines • Show All 2,305 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 3,949 Lines • ▼ Show 20 Lines	if (match(Src1, m_Zero()) &&
ConstantInt::get(CC->getType(), SrcPred) };		ConstantInt::get(CC->getType(), SrcPred) };
CallInst *NewCall = Builder.CreateCall(NewF, Args);		CallInst *NewCall = Builder.CreateCall(NewF, Args);
NewCall->takeName(II);		NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);		return replaceInstUsesWith(*II, NewCall);
}		}

break;		break;
}		}
		case Intrinsic::amdgcn_ballot: {
		if (auto *Src = dyn_cast<ConstantInt>(II->getArgOperand(0))) {
		if (Src->isZero()) {
		// amdgcn.ballot(i1 0) is zero.
		return replaceInstUsesWith(*II, Constant::getNullValue(II->getType()));
		}

		if (Src->isOne()) {
		// amdgcn.ballot(i1 1) is exec.
		const char *RegName = "exec";
		if (II->getType()->isIntegerTy(32))
		foadUnsubmitted Not Done Reply Inline Actions Are you sure it's safe to use read_register(exec) for this? I'm not an expert but I have been told it's not safe, because the compiler doesn't understand that it can't reorder the read_register call past other instructions that modify exec. foad: Are you sure it's safe to use read_register(exec) for this? I'm not an expert but I have been…
		FlakebiAuthorUnsubmitted Not Done Reply Inline Actions Is there a safe way to get the exec register? I’m fine with removing this optimization here and doing it only in the SDag. Flakebi: Is there a safe way to get the exec register? I’m fine with removing this optimization here and…
		FlakebiAuthorUnsubmitted Done Reply Inline Actions Actually, icmp and fcmp already create `read_register(exec)` in InstCombine. So, having this combination and replacing icmp with ballot won’t change behavior. Flakebi: Actually, icmp and fcmp already create `read_register(exec)` in InstCombine. So, having this…
		arsenmUnsubmitted Not Done Reply Inline Actions Should move the getDeclaration below to where you actually use it. If you happened to hit the other size break case, you would have introduced a dead declaration arsenm: Should move the getDeclaration below to where you actually use it. If you happened to hit the…
		RegName = "exec_lo";
		arsenmUnsubmitted Not Done Reply Inline Actions What happens for wave32 here? arsenm: What happens for wave32 here?
		else if (!II->getType()->isIntegerTy(64))
		break;

		Function *NewF = Intrinsic::getDeclaration(
		II->getModule(), Intrinsic::read_register, II->getType());
		Metadata *MDArgs[] = {MDString::get(II->getContext(), RegName)};
		MDNode *MD = MDNode::get(II->getContext(), MDArgs);
		Value *Args[] = {MetadataAsValue::get(II->getContext(), MD)};
		CallInst *NewCall = Builder.CreateCall(NewF, Args);
		NewCall->addAttribute(AttributeList::FunctionIndex,
		Attribute::Convergent);
		NewCall->takeName(II);
		return replaceInstUsesWith(*II, NewCall);
		}
		}
		break;
		}
case Intrinsic::amdgcn_wqm_vote: {		case Intrinsic::amdgcn_wqm_vote: {
// wqm_vote is identity when the argument is constant.		// wqm_vote is identity when the argument is constant.
if (!isa<Constant>(II->getArgOperand(0)))		if (!isa<Constant>(II->getArgOperand(0)))
break;		break;

return replaceInstUsesWith(*II, II->getArgOperand(0));		return replaceInstUsesWith(*II, II->getArgOperand(0));
}		}
case Intrinsic::amdgcn_kill: {		case Intrinsic::amdgcn_kill: {
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_gc_relocate: {
// If we have two copies of the same pointer in the statepoint argument		// If we have two copies of the same pointer in the statepoint argument
// list, canonicalize to one. This may let us common gc.relocates.		// list, canonicalize to one. This may let us common gc.relocates.
if (GCR.getBasePtr() == GCR.getDerivedPtr() &&		if (GCR.getBasePtr() == GCR.getDerivedPtr() &&
GCR.getBasePtrIndex() != GCR.getDerivedPtrIndex()) {		GCR.getBasePtrIndex() != GCR.getDerivedPtrIndex()) {
auto *OpIntTy = GCR.getOperand(2)->getType();		auto *OpIntTy = GCR.getOperand(2)->getType();
return replaceOperand(*II, 2,		return replaceOperand(*II, 2,
ConstantInt::get(OpIntTy, GCR.getBasePtrIndex()));		ConstantInt::get(OpIntTy, GCR.getBasePtrIndex()));
}		}

// Translate facts known about a pointer before relocating into		// Translate facts known about a pointer before relocating into
// facts about the relocate value, while being careful to		// facts about the relocate value, while being careful to
// preserve relocation semantics.		// preserve relocation semantics.
Value *DerivedPtr = GCR.getDerivedPtr();		Value *DerivedPtr = GCR.getDerivedPtr();

// Remove the relocation if unused, note that this check is required		// Remove the relocation if unused, note that this check is required
// to prevent the cases below from looping forever.		// to prevent the cases below from looping forever.
if (II->use_empty())		if (II->use_empty())
▲ Show 20 Lines • Show All 846 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32 immarg)
	declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32 immarg)
	declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32 immarg)

	; Show that what the atomic optimization pass will do for raw buffers.			; Show what the atomic optimization pass will do for raw buffers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: buffer_atomic_add v[[value]]			; GCN: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i32_uniform:			; GCN-LABEL: add_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: buffer_atomic_add v[[value]]			; GCN: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %additive) {			define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %additive) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %additive, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %additive, <4 x i32> %inout, i32 0, i32 0, i32 0)
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 1, <4 x i32> %inout, i32 %lane, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 1, <4 x i32> %inout, i32 %lane, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_constant:			; GCN-LABEL: sub_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: buffer_atomic_sub v[[value]]			; GCN: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_uniform:			; GCN-LABEL: sub_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: buffer_atomic_sub v[[value]]			; GCN: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %subitive) {			define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %subitive) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %subitive, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %subitive, <4 x i32> %inout, i32 0, i32 0, i32 0)
	Show All 40 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()

	; Show that what the atomic optimization pass will do for global pointers.			; Show what the atomic optimization pass will do for global pointers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: {{flat\|buffer\|global}}_atomic_add v[[value]]			; GCN: {{flat\|buffer\|global}}_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {			define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {
	entry:			entry:
	%old = atomicrmw add i32 addrspace(1)* %inout, i32 5 acq_rel			%old = atomicrmw add i32 addrspace(1)* %inout, i32 5 acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i32_uniform:			; GCN-LABEL: add_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: {{flat\|buffer\|global}}_atomic_add v[[value]]			; GCN: {{flat\|buffer\|global}}_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, i32 addrspace(1)* %inout, i32 %additive) {			define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, i32 addrspace(1)* %inout, i32 %additive) {
	entry:			entry:
	%old = atomicrmw add i32 addrspace(1)* %inout, i32 %additive acq_rel			%old = atomicrmw add i32 addrspace(1)* %inout, i32 %additive acq_rel
	Show All 16 Lines
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw add i32 addrspace(1)* %inout, i32 %lane acq_rel			%old = atomicrmw add i32 addrspace(1)* %inout, i32 %lane acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i64_constant:			; GCN-LABEL: add_i64_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_hi_u32_u24{{(_e[0-9]+)?}} v[[value_hi:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_hi_u32_u24{{(_e[0-9]+)?}} v[[value_hi:[0-9]+]], s[[popcount]], 5
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value_lo:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value_lo:[0-9]+]], s[[popcount]], 5
	; GCN: {{flat\|buffer\|global}}_atomic_add_x2 v{{\[}}[[value_lo]]:[[value_hi]]{{\]}}			; GCN: {{flat\|buffer\|global}}_atomic_add_x2 v{{\[}}[[value_lo]]:[[value_hi]]{{\]}}
	define amdgpu_kernel void @add_i64_constant(i64 addrspace(1)* %out, i64 addrspace(1)* %inout) {			define amdgpu_kernel void @add_i64_constant(i64 addrspace(1)* %out, i64 addrspace(1)* %inout) {
	entry:			entry:
	%old = atomicrmw add i64 addrspace(1)* %inout, i64 5 acq_rel			%old = atomicrmw add i64 addrspace(1)* %inout, i64 5 acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i64_uniform:			; GCN-LABEL: add_i64_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
	; GCN32: s_bcnt1_i32_b32 s{{[0-9]+}}, s[[exec_lo]]			; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN64: s_bcnt1_i32_b64 s{{[0-9]+}}, s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
				; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: {{flat\|buffer\|global}}_atomic_add_x2 v{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}			; GCN: {{flat\|buffer\|global}}_atomic_add_x2 v{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}
	define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 addrspace(1)* %inout, i64 %additive) {			define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 addrspace(1)* %inout, i64 %additive) {
	entry:			entry:
	%old = atomicrmw add i64 addrspace(1)* %inout, i64 %additive acq_rel			%old = atomicrmw add i64 addrspace(1)* %inout, i64 %additive acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i64_varying:			; GCN-LABEL: add_i64_varying:
	; GCN-NOT: v_mbcnt_lo_u32_b32			; GCN-NOT: v_mbcnt_lo_u32_b32
	; GCN-NOT: v_mbcnt_hi_u32_b32			; GCN-NOT: v_mbcnt_hi_u32_b32
	; GCN-NOT: s_bcnt1_i32_b64			; GCN-NOT: s_bcnt1_i32_b64
	; GCN: {{flat\|buffer\|global}}_atomic_add_x2 v{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}			; GCN: {{flat\|buffer\|global}}_atomic_add_x2 v{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}
	define amdgpu_kernel void @add_i64_varying(i64 addrspace(1)* %out, i64 addrspace(1)* %inout) {			define amdgpu_kernel void @add_i64_varying(i64 addrspace(1)* %out, i64 addrspace(1)* %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%zext = zext i32 %lane to i64			%zext = zext i32 %lane to i64
	%old = atomicrmw add i64 addrspace(1)* %inout, i64 %zext acq_rel			%old = atomicrmw add i64 addrspace(1)* %inout, i64 %zext acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_constant:			; GCN-LABEL: sub_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: {{flat\|buffer\|global}}_atomic_sub v[[value]]			; GCN: {{flat\|buffer\|global}}_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {			define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {
	entry:			entry:
	%old = atomicrmw sub i32 addrspace(1)* %inout, i32 5 acq_rel			%old = atomicrmw sub i32 addrspace(1)* %inout, i32 5 acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_uniform:			; GCN-LABEL: sub_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: {{flat\|buffer\|global}}_atomic_sub v[[value]]			; GCN: {{flat\|buffer\|global}}_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, i32 addrspace(1)* %inout, i32 %subitive) {			define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, i32 addrspace(1)* %inout, i32 %subitive) {
	entry:			entry:
	%old = atomicrmw sub i32 addrspace(1)* %inout, i32 %subitive acq_rel			%old = atomicrmw sub i32 addrspace(1)* %inout, i32 %subitive acq_rel
	Show All 16 Lines
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw sub i32 addrspace(1)* %inout, i32 %lane acq_rel			%old = atomicrmw sub i32 addrspace(1)* %inout, i32 %lane acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i64_constant:			; GCN-LABEL: sub_i64_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_hi_u32_u24{{(_e[0-9]+)?}} v[[value_hi:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_hi_u32_u24{{(_e[0-9]+)?}} v[[value_hi:[0-9]+]], s[[popcount]], 5
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value_lo:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value_lo:[0-9]+]], s[[popcount]], 5
	; GCN: {{flat\|buffer\|global}}_atomic_sub_x2 v{{\[}}[[value_lo]]:[[value_hi]]{{\]}}			; GCN: {{flat\|buffer\|global}}_atomic_sub_x2 v{{\[}}[[value_lo]]:[[value_hi]]{{\]}}
	define amdgpu_kernel void @sub_i64_constant(i64 addrspace(1)* %out, i64 addrspace(1)* %inout) {			define amdgpu_kernel void @sub_i64_constant(i64 addrspace(1)* %out, i64 addrspace(1)* %inout) {
	entry:			entry:
	%old = atomicrmw sub i64 addrspace(1)* %inout, i64 5 acq_rel			%old = atomicrmw sub i64 addrspace(1)* %inout, i64 5 acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i64_uniform:			; GCN-LABEL: sub_i64_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
	; GCN32: s_bcnt1_i32_b32 s{{[0-9]+}}, s[[exec_lo]]			; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN64: s_bcnt1_i32_b64 s{{[0-9]+}}, s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
				; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: {{flat\|buffer\|global}}_atomic_sub_x2 v{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}			; GCN: {{flat\|buffer\|global}}_atomic_sub_x2 v{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}
	define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 addrspace(1)* %inout, i64 %subitive) {			define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 addrspace(1)* %inout, i64 %subitive) {
	entry:			entry:
	%old = atomicrmw sub i64 addrspace(1)* %inout, i64 %subitive acq_rel			%old = atomicrmw sub i64 addrspace(1)* %inout, i64 %subitive acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	Show All 13 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX7LESS %s		; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX7LESS %s
; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX8 %s		; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX8 %s
; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9 %s		; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9 %s
; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1064 %s		; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1064 %s
; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1032 %s		; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX1032 %s

declare i32 @llvm.amdgcn.workitem.id.x()		declare i32 @llvm.amdgcn.workitem.id.x()

@local_var32 = addrspace(3) global i32 undef, align 4		@local_var32 = addrspace(3) global i32 undef, align 4
@local_var64 = addrspace(3) global i64 undef, align 8		@local_var64 = addrspace(3) global i64 undef, align 8

; Show that what the atomic optimization pass will do for local pointers.		; Show what the atomic optimization pass will do for local pointers.

define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out) {		define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: add_i32_constant:		; GFX7LESS-LABEL: add_i32_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[2:3], exec
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s3, v0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s5, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB0_2		; GFX7LESS-NEXT: s_cbranch_execz BB0_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: v_mul_u32_u24_e64 v2, s4, 5		; GFX7LESS-NEXT: v_mul_u32_u24_e64 v2, s2, 5
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_add_rtn_u32 v1, v1, v2		; GFX7LESS-NEXT: ds_add_rtn_u32 v1, v1, v2
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: BB0_2:		; GFX7LESS-NEXT: BB0_2:
; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v1		; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: v_mad_u32_u24 v0, v0, 5, s2		; GFX7LESS-NEXT: v_mad_u32_u24 v0, v0, 5, s2
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i32_constant:		; GFX8-LABEL: add_i32_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz BB0_2		; GFX8-NEXT: s_cbranch_execz BB0_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX8-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX8-NEXT: v_mul_u32_u24_e64 v1, s4, 5		; GFX8-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX8-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX8-NEXT: s_mov_b32 m0, -1		; GFX8-NEXT: s_mov_b32 m0, -1
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: ds_add_rtn_u32 v1, v2, v1		; GFX8-NEXT: ds_add_rtn_u32 v1, v2, v1
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: buffer_wbinvl1_vol		; GFX8-NEXT: buffer_wbinvl1_vol
; GFX8-NEXT: BB0_2:		; GFX8-NEXT: BB0_2:
; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_readfirstlane_b32 s2, v1		; GFX8-NEXT: v_readfirstlane_b32 s2, v1
; GFX8-NEXT: v_mad_u32_u24 v0, v0, 5, s2		; GFX8-NEXT: v_mad_u32_u24 v0, v0, 5, s2
; GFX8-NEXT: s_mov_b32 s3, 0xf000		; GFX8-NEXT: s_mov_b32 s3, 0xf000
; GFX8-NEXT: s_mov_b32 s2, -1		; GFX8-NEXT: s_mov_b32 s2, -1
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i32_constant:		; GFX9-LABEL: add_i32_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz BB0_2		; GFX9-NEXT: s_cbranch_execz BB0_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX9-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX9-NEXT: v_mul_u32_u24_e64 v1, s4, 5		; GFX9-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX9-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: ds_add_rtn_u32 v1, v2, v1		; GFX9-NEXT: ds_add_rtn_u32 v1, v2, v1
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: buffer_wbinvl1_vol		; GFX9-NEXT: buffer_wbinvl1_vol
; GFX9-NEXT: BB0_2:		; GFX9-NEXT: BB0_2:
; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_readfirstlane_b32 s2, v1		; GFX9-NEXT: v_readfirstlane_b32 s2, v1
; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, s2		; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, s2
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_mov_b32 s2, -1		; GFX9-NEXT: s_mov_b32 s2, -1
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i32_constant:		; GFX1064-LABEL: add_i32_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: ; implicit-def: $vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr1
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s5, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX1064-NEXT: s_cbranch_execz BB0_2		; GFX1064-NEXT: s_cbranch_execz BB0_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX1064-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX1064-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX1064-NEXT: v_mul_u32_u24_e64 v1, s4, 5		; GFX1064-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0		; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1064-NEXT: ds_add_rtn_u32 v1, v2, v1		; GFX1064-NEXT: ds_add_rtn_u32 v1, v2, v1
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: buffer_gl0_inv		; GFX1064-NEXT: buffer_gl0_inv
; GFX1064-NEXT: buffer_gl1_inv		; GFX1064-NEXT: buffer_gl1_inv
; GFX1064-NEXT: BB0_2:		; GFX1064-NEXT: BB0_2:
; GFX1064-NEXT: v_nop		; GFX1064-NEXT: v_nop
; GFX1064-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX1064-NEXT: v_readfirstlane_b32 s2, v1		; GFX1064-NEXT: v_readfirstlane_b32 s2, v1
; GFX1064-NEXT: s_mov_b32 s3, 0x31016000		; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
; GFX1064-NEXT: v_mad_u32_u24 v0, v0, 5, s2		; GFX1064-NEXT: v_mad_u32_u24 v0, v0, 5, s2
; GFX1064-NEXT: s_mov_b32 s2, -1		; GFX1064-NEXT: s_mov_b32 s2, -1
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i32_constant:		; GFX1032-LABEL: add_i32_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s3, 1, 0		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr1
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s3, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB0_2		; GFX1032-NEXT: s_cbranch_execz BB0_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s3, s3		; GFX1032-NEXT: s_bcnt1_i32_b32 s2, s2
; GFX1032-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX1032-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX1032-NEXT: v_mul_u32_u24_e64 v1, s3, 5		; GFX1032-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0		; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1032-NEXT: ds_add_rtn_u32 v1, v2, v1		; GFX1032-NEXT: ds_add_rtn_u32 v1, v2, v1
; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1032-NEXT: buffer_gl0_inv		; GFX1032-NEXT: buffer_gl0_inv
; GFX1032-NEXT: buffer_gl1_inv		; GFX1032-NEXT: buffer_gl1_inv
; GFX1032-NEXT: BB0_2:		; GFX1032-NEXT: BB0_2:
; GFX1032-NEXT: v_nop		; GFX1032-NEXT: v_nop
; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s2		; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
; GFX1032-NEXT: v_readfirstlane_b32 s2, v1		; GFX1032-NEXT: v_readfirstlane_b32 s2, v1
; GFX1032-NEXT: s_mov_b32 s3, 0x31016000		; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
; GFX1032-NEXT: v_mad_u32_u24 v0, v0, 5, s2		; GFX1032-NEXT: v_mad_u32_u24 v0, v0, 5, s2
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: s_nop 1		; GFX1032-NEXT: s_nop 1
; GFX1032-NEXT: s_waitcnt lgkmcnt(0)		; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 5 acq_rel		%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 5 acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, i32 %additive) {		define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, i32 %additive) {
;		;
;		;
; GFX7LESS-LABEL: add_i32_uniform:		; GFX7LESS-LABEL: add_i32_uniform:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[6:7], exec
; GFX7LESS-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
; GFX7LESS-NEXT: s_load_dword s2, s[0:1], 0xb		; GFX7LESS-NEXT: s_load_dword s2, s[0:1], 0xb
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[0:1], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[0:1], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB1_2		; GFX7LESS-NEXT: s_cbranch_execz BB1_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s3, s[6:7]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s3, s[6:7]
Show All 16 Lines
; GFX7LESS-NEXT: s_mov_b32 s6, -1		; GFX7LESS-NEXT: s_mov_b32 s6, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i32_uniform:		; GFX8-LABEL: add_i32_uniform:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX8-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX8-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX8-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GFX8-NEXT: s_cbranch_execz BB1_2		; GFX8-NEXT: s_cbranch_execz BB1_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s1, s[6:7]		; GFX8-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_mul_i32 s1, s0, s1		; GFX8-NEXT: s_mul_i32 s1, s0, s1
; GFX8-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX8-NEXT: v_mov_b32_e32 v2, s1		; GFX8-NEXT: v_mov_b32_e32 v2, s1
; GFX8-NEXT: s_mov_b32 m0, -1		; GFX8-NEXT: s_mov_b32 m0, -1
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: ds_add_rtn_u32 v1, v1, v2		; GFX8-NEXT: ds_add_rtn_u32 v1, v1, v2
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: buffer_wbinvl1_vol		; GFX8-NEXT: buffer_wbinvl1_vol
; GFX8-NEXT: BB1_2:		; GFX8-NEXT: BB1_2:
; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX8-NEXT: s_or_b64 exec, exec, s[6:7]
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mul_lo_u32 v0, s0, v0		; GFX8-NEXT: v_mul_lo_u32 v0, s0, v0
; GFX8-NEXT: v_readfirstlane_b32 s0, v1		; GFX8-NEXT: v_readfirstlane_b32 s0, v1
; GFX8-NEXT: s_mov_b32 s7, 0xf000		; GFX8-NEXT: s_mov_b32 s7, 0xf000
; GFX8-NEXT: s_mov_b32 s6, -1		; GFX8-NEXT: s_mov_b32 s6, -1
; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v0		; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v0
; GFX8-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i32_uniform:		; GFX9-LABEL: add_i32_uniform:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX9-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX9-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GFX9-NEXT: s_cbranch_execz BB1_2		; GFX9-NEXT: s_cbranch_execz BB1_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s1, s[6:7]		; GFX9-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_mul_i32 s1, s0, s1		; GFX9-NEXT: s_mul_i32 s1, s0, s1
; GFX9-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: ds_add_rtn_u32 v1, v1, v2		; GFX9-NEXT: ds_add_rtn_u32 v1, v1, v2
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: buffer_wbinvl1_vol		; GFX9-NEXT: buffer_wbinvl1_vol
; GFX9-NEXT: BB1_2:		; GFX9-NEXT: BB1_2:
; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX9-NEXT: s_or_b64 exec, exec, s[6:7]
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mul_lo_u32 v0, s0, v0		; GFX9-NEXT: v_mul_lo_u32 v0, s0, v0
; GFX9-NEXT: v_readfirstlane_b32 s0, v1		; GFX9-NEXT: v_readfirstlane_b32 s0, v1
; GFX9-NEXT: s_mov_b32 s7, 0xf000		; GFX9-NEXT: s_mov_b32 s7, 0xf000
; GFX9-NEXT: s_mov_b32 s6, -1		; GFX9-NEXT: s_mov_b32 s6, -1
; GFX9-NEXT: v_add_u32_e32 v0, s0, v0		; GFX9-NEXT: v_add_u32_e32 v0, s0, v0
; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i32_uniform:		; GFX1064-LABEL: add_i32_uniform:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX1064-NEXT: ; implicit-def: $vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr1
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s7, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GFX1064-NEXT: s_cbranch_execz BB1_2		; GFX1064-NEXT: s_cbranch_execz BB1_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: s_bcnt1_i32_b64 s1, s[6:7]		; GFX1064-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
; GFX1064-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: s_mul_i32 s1, s0, s1		; GFX1064-NEXT: s_mul_i32 s1, s0, s1
; GFX1064-NEXT: v_mov_b32_e32 v2, s1		; GFX1064-NEXT: v_mov_b32_e32 v2, s1
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0		; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1064-NEXT: ds_add_rtn_u32 v1, v1, v2		; GFX1064-NEXT: ds_add_rtn_u32 v1, v1, v2
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: buffer_gl0_inv		; GFX1064-NEXT: buffer_gl0_inv
; GFX1064-NEXT: buffer_gl1_inv		; GFX1064-NEXT: buffer_gl1_inv
; GFX1064-NEXT: BB1_2:		; GFX1064-NEXT: BB1_2:
; GFX1064-NEXT: v_nop		; GFX1064-NEXT: v_nop
; GFX1064-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX1064-NEXT: s_or_b64 exec, exec, s[6:7]
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: v_mul_lo_u32 v0, s0, v0		; GFX1064-NEXT: v_mul_lo_u32 v0, s0, v0
; GFX1064-NEXT: v_readfirstlane_b32 s0, v1		; GFX1064-NEXT: v_readfirstlane_b32 s0, v1
; GFX1064-NEXT: s_mov_b32 s7, 0x31016000		; GFX1064-NEXT: s_mov_b32 s7, 0x31016000
; GFX1064-NEXT: s_mov_b32 s6, -1		; GFX1064-NEXT: s_mov_b32 s6, -1
; GFX1064-NEXT: v_add_nc_u32_e32 v0, s0, v0		; GFX1064-NEXT: v_add_nc_u32_e32 v0, s0, v0
; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i32_uniform:		; GFX1032-LABEL: add_i32_uniform:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr1
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB1_2		; GFX1032-NEXT: s_cbranch_execz BB1_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s2, s2		; GFX1032-NEXT: s_bcnt1_i32_b32 s2, s2
Show All 19 Lines
; GFX1032-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX1032-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 %additive acq_rel		%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 %additive acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
; GFX7LESS-NOT: s_bcnt1_i32_b64
; DPPCOMB: v_add_u32_dpp
; DPPCOMB: v_add_u32_dpp
; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_add_rtn_u32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @add_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @add_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: add_i32_varying:		; GFX7LESS-LABEL: add_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_add_rtn_u32 v0, v1, v0		; GFX7LESS-NEXT: ds_add_rtn_u32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i32_varying:		; GFX8-LABEL: add_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i32_varying:		; GFX9-LABEL: add_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i32_varying:		; GFX1064-LABEL: add_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i32_varying:		; GFX1032-LABEL: add_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i32_varying_gfx1032:		; GFX8-LABEL: add_i32_varying_gfx1032:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i32_varying_gfx1032:		; GFX9-LABEL: add_i32_varying_gfx1032:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i32_varying_gfx1032:		; GFX1064-LABEL: add_i32_varying_gfx1032:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i32_varying_gfx1032:		; GFX1032-LABEL: add_i32_varying_gfx1032:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i32_varying_gfx1064:		; GFX8-LABEL: add_i32_varying_gfx1064:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i32_varying_gfx1064:		; GFX9-LABEL: add_i32_varying_gfx1064:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i32_varying_gfx1064:		; GFX1064-LABEL: add_i32_varying_gfx1064:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i32_varying_gfx1064:		; GFX1032-LABEL: add_i32_varying_gfx1064:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines	entry:
ret void		ret void
}		}

define amdgpu_kernel void @add_i64_constant(i64 addrspace(1)* %out) {		define amdgpu_kernel void @add_i64_constant(i64 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: add_i64_constant:		; GFX7LESS-LABEL: add_i64_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[4:5], exec
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s5, v0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s5, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB5_2		; GFX7LESS-NEXT: s_cbranch_execz BB5_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
Show All 18 Lines
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i64_constant:		; GFX8-LABEL: add_i64_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX8-NEXT: s_mov_b64 s[4:5], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX8-NEXT: s_cbranch_execz BB5_2		; GFX8-NEXT: s_cbranch_execz BB5_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX8-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
Show All 17 Lines
; GFX8-NEXT: s_nop 2		; GFX8-NEXT: s_nop 2
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i64_constant:		; GFX9-LABEL: add_i64_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX9-NEXT: s_mov_b64 s[4:5], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX9-NEXT: s_cbranch_execz BB5_2		; GFX9-NEXT: s_cbranch_execz BB5_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX9-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
Show All 15 Lines
; GFX9-NEXT: s_mov_b32 s2, -1		; GFX9-NEXT: s_mov_b32 s2, -1
; GFX9-NEXT: s_nop 2		; GFX9-NEXT: s_nop 2
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i64_constant:		; GFX1064-LABEL: add_i64_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX1064-NEXT: s_mov_b64 s[4:5], exec
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s5, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s5, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX1064-NEXT: s_cbranch_execz BB5_2		; GFX1064-NEXT: s_cbranch_execz BB5_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
Show All 18 Lines
; GFX1064-NEXT: s_nop 2		; GFX1064-NEXT: s_nop 2
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i64_constant:		; GFX1032-LABEL: add_i64_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s3, 1, 0		; GFX1032-NEXT: s_mov_b32 s3, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s3, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s3, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB5_2		; GFX1032-NEXT: s_cbranch_execz BB5_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s3, s3		; GFX1032-NEXT: s_bcnt1_i32_b32 s3, s3
Show All 24 Lines	entry:
ret void		ret void
}		}

define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive) {		define amdgpu_kernel void @add_i64_uniform(i64 addrspace(1)* %out, i64 %additive) {
;		;
;		;
; GFX7LESS-LABEL: add_i64_uniform:		; GFX7LESS-LABEL: add_i64_uniform:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[6:7], exec
; GFX7LESS-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX7LESS-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB6_2		; GFX7LESS-NEXT: s_cbranch_execz BB6_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s6, s[6:7]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
Show All 27 Lines
; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s0, v0		; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s0, v0
; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc		; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: add_i64_uniform:		; GFX8-LABEL: add_i64_uniform:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX8-NEXT: s_mov_b64 s[6:7], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz BB6_2		; GFX8-NEXT: s_cbranch_execz BB6_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[6:7]		; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
Show All 27 Lines
; GFX8-NEXT: s_mov_b32 s6, -1		; GFX8-NEXT: s_mov_b32 s6, -1
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc		; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_i64_uniform:		; GFX9-LABEL: add_i64_uniform:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX9-NEXT: s_mov_b64 s[6:7], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz BB6_2		; GFX9-NEXT: s_cbranch_execz BB6_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s6, s[6:7]		; GFX9-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
Show All 25 Lines
; GFX9-NEXT: s_mov_b32 s7, 0xf000		; GFX9-NEXT: s_mov_b32 s7, 0xf000
; GFX9-NEXT: s_mov_b32 s6, -1		; GFX9-NEXT: s_mov_b32 s6, -1
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: add_i64_uniform:		; GFX1064-LABEL: add_i64_uniform:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX1064-NEXT: s_mov_b64 s[6:7], exec
; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s7, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s7, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX1064-NEXT: s_cbranch_execz BB6_2		; GFX1064-NEXT: s_cbranch_execz BB6_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
Show All 27 Lines
; GFX1064-NEXT: v_add_co_u32_e64 v0, vcc, s4, v0		; GFX1064-NEXT: v_add_co_u32_e64 v0, vcc, s4, v0
; GFX1064-NEXT: v_add_co_ci_u32_e32 v1, vcc, s5, v1, vcc		; GFX1064-NEXT: v_add_co_ci_u32_e32 v1, vcc, s5, v1, vcc
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: add_i64_uniform:		; GFX1032-LABEL: add_i64_uniform:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s5, 1, 0		; GFX1032-NEXT: s_mov_b32 s5, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s5, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s5, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB6_2		; GFX1032-NEXT: s_cbranch_execz BB6_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s5, s5		; GFX1032-NEXT: s_bcnt1_i32_b32 s5, s5
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out) {		define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: sub_i32_constant:		; GFX7LESS-LABEL: sub_i32_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[2:3], exec
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s3, v0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s5, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB8_2		; GFX7LESS-NEXT: s_cbranch_execz BB8_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: v_mul_u32_u24_e64 v2, s4, 5		; GFX7LESS-NEXT: v_mul_u32_u24_e64 v2, s2, 5
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_sub_rtn_u32 v1, v1, v2		; GFX7LESS-NEXT: ds_sub_rtn_u32 v1, v1, v2
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: BB8_2:		; GFX7LESS-NEXT: BB8_2:
; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v1		; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v1
; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v0		; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v0
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s2, v0		; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s2, v0
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: sub_i32_constant:		; GFX8-LABEL: sub_i32_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz BB8_2		; GFX8-NEXT: s_cbranch_execz BB8_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX8-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX8-NEXT: v_mul_u32_u24_e64 v1, s4, 5		; GFX8-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX8-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX8-NEXT: s_mov_b32 m0, -1		; GFX8-NEXT: s_mov_b32 m0, -1
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: ds_sub_rtn_u32 v1, v2, v1		; GFX8-NEXT: ds_sub_rtn_u32 v1, v2, v1
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: buffer_wbinvl1_vol		; GFX8-NEXT: buffer_wbinvl1_vol
; GFX8-NEXT: BB8_2:		; GFX8-NEXT: BB8_2:
; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_readfirstlane_b32 s2, v1		; GFX8-NEXT: v_readfirstlane_b32 s2, v1
; GFX8-NEXT: v_mul_u32_u24_e32 v0, 5, v0		; GFX8-NEXT: v_mul_u32_u24_e32 v0, 5, v0
; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v0		; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v0
; GFX8-NEXT: s_mov_b32 s3, 0xf000		; GFX8-NEXT: s_mov_b32 s3, 0xf000
; GFX8-NEXT: s_mov_b32 s2, -1		; GFX8-NEXT: s_mov_b32 s2, -1
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: sub_i32_constant:		; GFX9-LABEL: sub_i32_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz BB8_2		; GFX9-NEXT: s_cbranch_execz BB8_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX9-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX9-NEXT: v_mul_u32_u24_e64 v1, s4, 5		; GFX9-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX9-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: ds_sub_rtn_u32 v1, v2, v1		; GFX9-NEXT: ds_sub_rtn_u32 v1, v2, v1
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: buffer_wbinvl1_vol		; GFX9-NEXT: buffer_wbinvl1_vol
; GFX9-NEXT: BB8_2:		; GFX9-NEXT: BB8_2:
; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_readfirstlane_b32 s2, v1		; GFX9-NEXT: v_readfirstlane_b32 s2, v1
; GFX9-NEXT: v_mul_u32_u24_e32 v0, 5, v0		; GFX9-NEXT: v_mul_u32_u24_e32 v0, 5, v0
; GFX9-NEXT: v_sub_u32_e32 v0, s2, v0		; GFX9-NEXT: v_sub_u32_e32 v0, s2, v0
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_mov_b32 s2, -1		; GFX9-NEXT: s_mov_b32 s2, -1
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: sub_i32_constant:		; GFX1064-LABEL: sub_i32_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: ; implicit-def: $vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr1
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s5, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX1064-NEXT: s_cbranch_execz BB8_2		; GFX1064-NEXT: s_cbranch_execz BB8_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX1064-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
; GFX1064-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX1064-NEXT: v_mul_u32_u24_e64 v1, s4, 5		; GFX1064-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0		; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1064-NEXT: ds_sub_rtn_u32 v1, v2, v1		; GFX1064-NEXT: ds_sub_rtn_u32 v1, v2, v1
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: buffer_gl0_inv		; GFX1064-NEXT: buffer_gl0_inv
; GFX1064-NEXT: buffer_gl1_inv		; GFX1064-NEXT: buffer_gl1_inv
; GFX1064-NEXT: BB8_2:		; GFX1064-NEXT: BB8_2:
; GFX1064-NEXT: v_nop		; GFX1064-NEXT: v_nop
; GFX1064-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX1064-NEXT: v_readfirstlane_b32 s2, v1		; GFX1064-NEXT: v_readfirstlane_b32 s2, v1
; GFX1064-NEXT: v_mul_u32_u24_e32 v0, 5, v0		; GFX1064-NEXT: v_mul_u32_u24_e32 v0, 5, v0
; GFX1064-NEXT: s_mov_b32 s3, 0x31016000		; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s2, v0		; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s2, v0
; GFX1064-NEXT: s_mov_b32 s2, -1		; GFX1064-NEXT: s_mov_b32 s2, -1
; GFX1064-NEXT: s_nop 0		; GFX1064-NEXT: s_nop 0
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: sub_i32_constant:		; GFX1032-LABEL: sub_i32_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s3, 1, 0		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr1
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s3, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB8_2		; GFX1032-NEXT: s_cbranch_execz BB8_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s3, s3		; GFX1032-NEXT: s_bcnt1_i32_b32 s2, s2
; GFX1032-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo		; GFX1032-NEXT: v_mov_b32_e32 v2, local_var32@abs32@lo
; GFX1032-NEXT: v_mul_u32_u24_e64 v1, s3, 5		; GFX1032-NEXT: v_mul_u32_u24_e64 v1, s2, 5
; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0		; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1032-NEXT: ds_sub_rtn_u32 v1, v2, v1		; GFX1032-NEXT: ds_sub_rtn_u32 v1, v2, v1
; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1032-NEXT: buffer_gl0_inv		; GFX1032-NEXT: buffer_gl0_inv
; GFX1032-NEXT: buffer_gl1_inv		; GFX1032-NEXT: buffer_gl1_inv
; GFX1032-NEXT: BB8_2:		; GFX1032-NEXT: BB8_2:
; GFX1032-NEXT: v_nop		; GFX1032-NEXT: v_nop
; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s2		; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
; GFX1032-NEXT: v_readfirstlane_b32 s2, v1		; GFX1032-NEXT: v_readfirstlane_b32 s2, v1
; GFX1032-NEXT: v_mul_u32_u24_e32 v0, 5, v0		; GFX1032-NEXT: v_mul_u32_u24_e32 v0, 5, v0
; GFX1032-NEXT: s_mov_b32 s3, 0x31016000		; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
; GFX1032-NEXT: v_sub_nc_u32_e32 v0, s2, v0		; GFX1032-NEXT: v_sub_nc_u32_e32 v0, s2, v0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: s_nop 0		; GFX1032-NEXT: s_nop 0
; GFX1032-NEXT: s_waitcnt lgkmcnt(0)		; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 5 acq_rel		%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 5 acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, i32 %subitive) {		define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, i32 %subitive) {
;		;
;		;
; GFX7LESS-LABEL: sub_i32_uniform:		; GFX7LESS-LABEL: sub_i32_uniform:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[6:7], exec
; GFX7LESS-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
; GFX7LESS-NEXT: s_load_dword s2, s[0:1], 0xb		; GFX7LESS-NEXT: s_load_dword s2, s[0:1], 0xb
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[0:1], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[0:1], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB9_2		; GFX7LESS-NEXT: s_cbranch_execz BB9_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s3, s[6:7]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s3, s[6:7]
Show All 16 Lines
; GFX7LESS-NEXT: s_mov_b32 s6, -1		; GFX7LESS-NEXT: s_mov_b32 s6, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: sub_i32_uniform:		; GFX8-LABEL: sub_i32_uniform:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX8-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX8-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX8-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GFX8-NEXT: s_cbranch_execz BB9_2		; GFX8-NEXT: s_cbranch_execz BB9_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s1, s[6:7]		; GFX8-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_mul_i32 s1, s0, s1		; GFX8-NEXT: s_mul_i32 s1, s0, s1
; GFX8-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX8-NEXT: v_mov_b32_e32 v2, s1		; GFX8-NEXT: v_mov_b32_e32 v2, s1
; GFX8-NEXT: s_mov_b32 m0, -1		; GFX8-NEXT: s_mov_b32 m0, -1
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: ds_sub_rtn_u32 v1, v1, v2		; GFX8-NEXT: ds_sub_rtn_u32 v1, v1, v2
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: buffer_wbinvl1_vol		; GFX8-NEXT: buffer_wbinvl1_vol
; GFX8-NEXT: BB9_2:		; GFX8-NEXT: BB9_2:
; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX8-NEXT: s_or_b64 exec, exec, s[6:7]
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mul_lo_u32 v0, s0, v0		; GFX8-NEXT: v_mul_lo_u32 v0, s0, v0
; GFX8-NEXT: v_readfirstlane_b32 s0, v1		; GFX8-NEXT: v_readfirstlane_b32 s0, v1
; GFX8-NEXT: s_mov_b32 s7, 0xf000		; GFX8-NEXT: s_mov_b32 s7, 0xf000
; GFX8-NEXT: s_mov_b32 s6, -1		; GFX8-NEXT: s_mov_b32 s6, -1
; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v0		; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v0
; GFX8-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: sub_i32_uniform:		; GFX9-LABEL: sub_i32_uniform:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX9-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX9-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GFX9-NEXT: s_cbranch_execz BB9_2		; GFX9-NEXT: s_cbranch_execz BB9_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s1, s[6:7]		; GFX9-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_mul_i32 s1, s0, s1		; GFX9-NEXT: s_mul_i32 s1, s0, s1
; GFX9-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX9-NEXT: v_mov_b32_e32 v2, s1		; GFX9-NEXT: v_mov_b32_e32 v2, s1
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: ds_sub_rtn_u32 v1, v1, v2		; GFX9-NEXT: ds_sub_rtn_u32 v1, v1, v2
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: buffer_wbinvl1_vol		; GFX9-NEXT: buffer_wbinvl1_vol
; GFX9-NEXT: BB9_2:		; GFX9-NEXT: BB9_2:
; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX9-NEXT: s_or_b64 exec, exec, s[6:7]
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mul_lo_u32 v0, s0, v0		; GFX9-NEXT: v_mul_lo_u32 v0, s0, v0
; GFX9-NEXT: v_readfirstlane_b32 s0, v1		; GFX9-NEXT: v_readfirstlane_b32 s0, v1
; GFX9-NEXT: s_mov_b32 s7, 0xf000		; GFX9-NEXT: s_mov_b32 s7, 0xf000
; GFX9-NEXT: s_mov_b32 s6, -1		; GFX9-NEXT: s_mov_b32 s6, -1
; GFX9-NEXT: v_sub_u32_e32 v0, s0, v0		; GFX9-NEXT: v_sub_u32_e32 v0, s0, v0
; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: sub_i32_uniform:		; GFX1064-LABEL: sub_i32_uniform:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX1064-NEXT: ; implicit-def: $vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr1
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s7, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc
; GFX1064-NEXT: s_cbranch_execz BB9_2		; GFX1064-NEXT: s_cbranch_execz BB9_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: s_bcnt1_i32_b64 s1, s[6:7]		; GFX1064-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
; GFX1064-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: s_mul_i32 s1, s0, s1		; GFX1064-NEXT: s_mul_i32 s1, s0, s1
; GFX1064-NEXT: v_mov_b32_e32 v2, s1		; GFX1064-NEXT: v_mov_b32_e32 v2, s1
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0		; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
; GFX1064-NEXT: ds_sub_rtn_u32 v1, v1, v2		; GFX1064-NEXT: ds_sub_rtn_u32 v1, v1, v2
; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX1064-NEXT: buffer_gl0_inv		; GFX1064-NEXT: buffer_gl0_inv
; GFX1064-NEXT: buffer_gl1_inv		; GFX1064-NEXT: buffer_gl1_inv
; GFX1064-NEXT: BB9_2:		; GFX1064-NEXT: BB9_2:
; GFX1064-NEXT: v_nop		; GFX1064-NEXT: v_nop
; GFX1064-NEXT: s_or_b64 exec, exec, s[2:3]		; GFX1064-NEXT: s_or_b64 exec, exec, s[6:7]
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: v_mul_lo_u32 v0, s0, v0		; GFX1064-NEXT: v_mul_lo_u32 v0, s0, v0
; GFX1064-NEXT: v_readfirstlane_b32 s0, v1		; GFX1064-NEXT: v_readfirstlane_b32 s0, v1
; GFX1064-NEXT: s_mov_b32 s7, 0x31016000		; GFX1064-NEXT: s_mov_b32 s7, 0x31016000
; GFX1064-NEXT: s_mov_b32 s6, -1		; GFX1064-NEXT: s_mov_b32 s6, -1
; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s0, v0		; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s0, v0
; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: sub_i32_uniform:		; GFX1032-LABEL: sub_i32_uniform:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c		; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr1
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB9_2		; GFX1032-NEXT: s_cbranch_execz BB9_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s2, s2		; GFX1032-NEXT: s_bcnt1_i32_b32 s2, s2
Show All 19 Lines
; GFX1032-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX1032-NEXT: buffer_store_dword v0, off, s[4:7], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 %subitive acq_rel		%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 %subitive acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
; GFX7LESS-NOT: s_bcnt1_i32_b64
; DPPCOMB: v_add_u32_dpp
; DPPCOMB: v_add_u32_dpp
; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_sub_rtn_u32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @sub_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @sub_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: sub_i32_varying:		; GFX7LESS-LABEL: sub_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_sub_rtn_u32 v0, v1, v0		; GFX7LESS-NEXT: ds_sub_rtn_u32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: sub_i32_varying:		; GFX8-LABEL: sub_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: sub_i32_varying:		; GFX9-LABEL: sub_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: sub_i32_varying:		; GFX1064-LABEL: sub_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: sub_i32_varying:		; GFX1032-LABEL: sub_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines	entry:
ret void		ret void
}		}

define amdgpu_kernel void @sub_i64_constant(i64 addrspace(1)* %out) {		define amdgpu_kernel void @sub_i64_constant(i64 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: sub_i64_constant:		; GFX7LESS-LABEL: sub_i64_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[4:5], exec
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s5, v0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s5, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB11_2		; GFX7LESS-NEXT: s_cbranch_execz BB11_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
Show All 18 Lines
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: sub_i64_constant:		; GFX8-LABEL: sub_i64_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX8-NEXT: s_mov_b64 s[4:5], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX8-NEXT: s_cbranch_execz BB11_2		; GFX8-NEXT: s_cbranch_execz BB11_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX8-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
Show All 18 Lines
; GFX8-NEXT: s_mov_b32 s2, -1		; GFX8-NEXT: s_mov_b32 s2, -1
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: sub_i64_constant:		; GFX9-LABEL: sub_i64_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX9-NEXT: s_mov_b64 s[4:5], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s5, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX9-NEXT: s_cbranch_execz BB11_2		; GFX9-NEXT: s_cbranch_execz BB11_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s4, s[4:5]		; GFX9-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
Show All 16 Lines
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_mov_b32 s2, -1		; GFX9-NEXT: s_mov_b32 s2, -1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: sub_i64_constant:		; GFX1064-LABEL: sub_i64_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[4:5], 1, 0		; GFX1064-NEXT: s_mov_b64 s[4:5], exec
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s5, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s5, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX1064-NEXT: s_cbranch_execz BB11_2		; GFX1064-NEXT: s_cbranch_execz BB11_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
Show All 20 Lines
; GFX1064-NEXT: s_mov_b32 s3, 0x31016000		; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: sub_i64_constant:		; GFX1032-LABEL: sub_i64_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s3, 1, 0		; GFX1032-NEXT: s_mov_b32 s3, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s3, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s3, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB11_2		; GFX1032-NEXT: s_cbranch_execz BB11_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s3, s3		; GFX1032-NEXT: s_bcnt1_i32_b32 s3, s3
Show All 26 Lines	entry:
ret void		ret void
}		}

define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive) {		define amdgpu_kernel void @sub_i64_uniform(i64 addrspace(1)* %out, i64 %subitive) {
;		;
;		;
; GFX7LESS-LABEL: sub_i64_uniform:		; GFX7LESS-LABEL: sub_i64_uniform:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
		; GFX7LESS-NEXT: s_mov_b64 s[6:7], exec
; GFX7LESS-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s7, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX7LESS-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX7LESS-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB12_2		; GFX7LESS-NEXT: s_cbranch_execz BB12_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: s_bcnt1_i32_b64 s6, s[6:7]		; GFX7LESS-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
Show All 27 Lines
; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s0, v0		; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s0, v0
; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc		; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: sub_i64_uniform:		; GFX8-LABEL: sub_i64_uniform:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX8-NEXT: s_mov_b64 s[6:7], exec
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX8-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz BB12_2		; GFX8-NEXT: s_cbranch_execz BB12_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[6:7]		; GFX8-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
Show All 27 Lines
; GFX8-NEXT: s_mov_b32 s6, -1		; GFX8-NEXT: s_mov_b32 s6, -1
; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc		; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: sub_i64_uniform:		; GFX9-LABEL: sub_i64_uniform:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX9-NEXT: s_mov_b64 s[6:7], exec
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s7, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX9-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz BB12_2		; GFX9-NEXT: s_cbranch_execz BB12_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: s_bcnt1_i32_b64 s6, s[6:7]		; GFX9-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
Show All 25 Lines
; GFX9-NEXT: s_mov_b32 s7, 0xf000		; GFX9-NEXT: s_mov_b32 s7, 0xf000
; GFX9-NEXT: s_mov_b32 s6, -1		; GFX9-NEXT: s_mov_b32 s6, -1
; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v2, v1, vcc		; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v2, v1, vcc
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: sub_i64_uniform:		; GFX1064-LABEL: sub_i64_uniform:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[6:7], 1, 0		; GFX1064-NEXT: s_mov_b64 s[6:7], exec
; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1064-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s7, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s7, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX1064-NEXT: s_cbranch_execz BB12_2		; GFX1064-NEXT: s_cbranch_execz BB12_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
Show All 27 Lines
; GFX1064-NEXT: v_sub_co_u32_e64 v0, vcc, s4, v0		; GFX1064-NEXT: v_sub_co_u32_e64 v0, vcc, s4, v0
; GFX1064-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s5, v1, vcc		; GFX1064-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s5, v1, vcc
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: sub_i64_uniform:		; GFX1032-LABEL: sub_i64_uniform:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s5, 1, 0		; GFX1032-NEXT: s_mov_b32 s5, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2		; GFX1032-NEXT: ; implicit-def: $vgpr1_vgpr2
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s5, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s5, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB12_2		; GFX1032-NEXT: s_cbranch_execz BB12_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: s_bcnt1_i32_b32 s5, s5		; GFX1032-NEXT: s_bcnt1_i32_b32 s5, s5
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
entry:		entry:
%lane = call i32 @llvm.amdgcn.workitem.id.x()		%lane = call i32 @llvm.amdgcn.workitem.id.x()
%zext = zext i32 %lane to i64		%zext = zext i32 %lane to i64
%old = atomicrmw sub i64 addrspace(3)* @local_var64, i64 %zext acq_rel		%old = atomicrmw sub i64 addrspace(3)* @local_var64, i64 %zext acq_rel
store i64 %old, i64 addrspace(1)* %out		store i64 %old, i64 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_and_rtn_b32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @and_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @and_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: and_i32_varying:		; GFX7LESS-LABEL: and_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_and_rtn_b32 v0, v1, v0		; GFX7LESS-NEXT: ds_and_rtn_b32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: and_i32_varying:		; GFX8-LABEL: and_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX8-NEXT: v_mov_b32_e32 v1, -1		; GFX8-NEXT: v_mov_b32_e32 v1, -1
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[2:3]
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, -1		; GFX8-NEXT: v_mov_b32_e32 v2, -1
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 34 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: and_i32_varying:		; GFX9-LABEL: and_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX9-NEXT: v_mov_b32_e32 v1, -1		; GFX9-NEXT: v_mov_b32_e32 v1, -1
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[2:3]
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, -1		; GFX9-NEXT: v_mov_b32_e32 v2, -1
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 33 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: and_i32_varying:		; GFX1064-LABEL: and_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, s3, v4
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, -1		; GFX1064-NEXT: v_mov_b32_e32 v1, -1
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, -1		; GFX1064-NEXT: v_mov_b32_e32 v2, -1
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1064-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
Show All 39 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: and_i32_varying:		; GFX1032-LABEL: and_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, -1		; GFX1032-NEXT: v_mov_b32_e32 v1, -1
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s2
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, -1		; GFX1032-NEXT: v_mov_b32_e32 v2, -1
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1032-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
Show All 35 Lines
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%lane = call i32 @llvm.amdgcn.workitem.id.x()		%lane = call i32 @llvm.amdgcn.workitem.id.x()
%old = atomicrmw and i32 addrspace(3)* @local_var32, i32 %lane acq_rel		%old = atomicrmw and i32 addrspace(3)* @local_var32, i32 %lane acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_or_rtn_b32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @or_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @or_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: or_i32_varying:		; GFX7LESS-LABEL: or_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_or_rtn_b32 v0, v1, v0		; GFX7LESS-NEXT: ds_or_rtn_b32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: or_i32_varying:		; GFX8-LABEL: or_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: or_i32_varying:		; GFX9-LABEL: or_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: or_i32_varying:		; GFX1064-LABEL: or_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: or_i32_varying:		; GFX1032-LABEL: or_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 33 Lines
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%lane = call i32 @llvm.amdgcn.workitem.id.x()		%lane = call i32 @llvm.amdgcn.workitem.id.x()
%old = atomicrmw or i32 addrspace(3)* @local_var32, i32 %lane acq_rel		%old = atomicrmw or i32 addrspace(3)* @local_var32, i32 %lane acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_xor_rtn_b32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @xor_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @xor_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: xor_i32_varying:		; GFX7LESS-LABEL: xor_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_xor_rtn_b32 v0, v1, v0		; GFX7LESS-NEXT: ds_xor_rtn_b32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: xor_i32_varying:		; GFX8-LABEL: xor_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: xor_i32_varying:		; GFX9-LABEL: xor_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: xor_i32_varying:		; GFX1064-LABEL: xor_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: xor_i32_varying:		; GFX1032-LABEL: xor_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 33 Lines
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%lane = call i32 @llvm.amdgcn.workitem.id.x()		%lane = call i32 @llvm.amdgcn.workitem.id.x()
%old = atomicrmw xor i32 addrspace(3)* @local_var32, i32 %lane acq_rel		%old = atomicrmw xor i32 addrspace(3)* @local_var32, i32 %lane acq_rel
store i32 %old, i32 addrspace(1)* %out		store i32 %old, i32 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_max_rtn_i32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @max_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @max_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: max_i32_varying:		; GFX7LESS-LABEL: max_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_max_rtn_i32 v0, v1, v0		; GFX7LESS-NEXT: ds_max_rtn_i32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: max_i32_varying:		; GFX8-LABEL: max_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX8-NEXT: v_bfrev_b32_e32 v1, 1		; GFX8-NEXT: v_bfrev_b32_e32 v1, 1
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[2:3]
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, v1		; GFX8-NEXT: v_mov_b32_e32 v2, v1
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 34 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: max_i32_varying:		; GFX9-LABEL: max_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX9-NEXT: v_bfrev_b32_e32 v1, 1		; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[2:3]
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, v1		; GFX9-NEXT: v_mov_b32_e32 v2, v1
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 33 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: max_i32_varying:		; GFX1064-LABEL: max_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, s3, v4
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX1064-NEXT: v_bfrev_b32_e32 v1, 1		; GFX1064-NEXT: v_bfrev_b32_e32 v1, 1
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v1		; GFX1064-NEXT: v_mov_b32_e32 v2, v1
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
Show All 39 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: max_i32_varying:		; GFX1032-LABEL: max_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
; GFX1032-NEXT: v_bfrev_b32_e32 v1, 1		; GFX1032-NEXT: v_bfrev_b32_e32 v1, 1
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s2
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, v1		; GFX1032-NEXT: v_mov_b32_e32 v2, v1
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
}		}

define amdgpu_kernel void @max_i64_constant(i64 addrspace(1)* %out) {		define amdgpu_kernel void @max_i64_constant(i64 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: max_i64_constant:		; GFX7LESS-LABEL: max_i64_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, exec_hi, v0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s3, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB18_2		; GFX7LESS-NEXT: s_cbranch_execz BB18_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5		; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5
; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0		; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0
Show All 18 Lines
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: max_i64_constant:		; GFX8-LABEL: max_i64_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX8-NEXT: s_cbranch_execz BB18_2		; GFX8-NEXT: s_cbranch_execz BB18_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: v_mov_b32_e32 v0, 5		; GFX8-NEXT: v_mov_b32_e32 v0, 5
; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
Show All 18 Lines
; GFX8-NEXT: s_mov_b32 s2, -1		; GFX8-NEXT: s_mov_b32 s2, -1
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: max_i64_constant:		; GFX9-LABEL: max_i64_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX9-NEXT: s_cbranch_execz BB18_2		; GFX9-NEXT: s_cbranch_execz BB18_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: v_mov_b32_e32 v0, 5		; GFX9-NEXT: v_mov_b32_e32 v0, 5
; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
Show All 16 Lines
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_mov_b32 s2, -1		; GFX9-NEXT: s_mov_b32 s2, -1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: max_i64_constant:		; GFX1064-LABEL: max_i64_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX1064-NEXT: s_cbranch_execz BB18_2		; GFX1064-NEXT: s_cbranch_execz BB18_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: v_mov_b32_e32 v0, 5		; GFX1064-NEXT: v_mov_b32_e32 v0, 5
; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX1064-NEXT: v_cndmask_b32_e64 v0, v0, s4, vcc		; GFX1064-NEXT: v_cndmask_b32_e64 v0, v0, s4, vcc
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: max_i64_constant:		; GFX1032-LABEL: max_i64_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB18_2		; GFX1032-NEXT: s_cbranch_execz BB18_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: v_mov_b32_e32 v0, 5		; GFX1032-NEXT: v_mov_b32_e32 v0, 5
; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
Show All 19 Lines
; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw max i64 addrspace(3)* @local_var64, i64 5 acq_rel		%old = atomicrmw max i64 addrspace(3)* @local_var64, i64 5 acq_rel
store i64 %old, i64 addrspace(1)* %out		store i64 %old, i64 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_min_rtn_i32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @min_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @min_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: min_i32_varying:		; GFX7LESS-LABEL: min_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_min_rtn_i32 v0, v1, v0		; GFX7LESS-NEXT: ds_min_rtn_i32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: min_i32_varying:		; GFX8-LABEL: min_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX8-NEXT: v_bfrev_b32_e32 v1, -2		; GFX8-NEXT: v_bfrev_b32_e32 v1, -2
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[2:3]
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, v1		; GFX8-NEXT: v_mov_b32_e32 v2, v1
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 34 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: min_i32_varying:		; GFX9-LABEL: min_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX9-NEXT: v_bfrev_b32_e32 v1, -2		; GFX9-NEXT: v_bfrev_b32_e32 v1, -2
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[2:3]
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, v1		; GFX9-NEXT: v_mov_b32_e32 v2, v1
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 33 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: min_i32_varying:		; GFX1064-LABEL: min_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, s3, v4
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX1064-NEXT: v_bfrev_b32_e32 v1, -2		; GFX1064-NEXT: v_bfrev_b32_e32 v1, -2
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v1		; GFX1064-NEXT: v_mov_b32_e32 v2, v1
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
Show All 39 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: min_i32_varying:		; GFX1032-LABEL: min_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
; GFX1032-NEXT: v_bfrev_b32_e32 v1, -2		; GFX1032-NEXT: v_bfrev_b32_e32 v1, -2
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s2
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, v1		; GFX1032-NEXT: v_mov_b32_e32 v2, v1
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
}		}

define amdgpu_kernel void @min_i64_constant(i64 addrspace(1)* %out) {		define amdgpu_kernel void @min_i64_constant(i64 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: min_i64_constant:		; GFX7LESS-LABEL: min_i64_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, exec_hi, v0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s3, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB20_2		; GFX7LESS-NEXT: s_cbranch_execz BB20_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5		; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5
; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0		; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0
Show All 18 Lines
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: min_i64_constant:		; GFX8-LABEL: min_i64_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX8-NEXT: s_cbranch_execz BB20_2		; GFX8-NEXT: s_cbranch_execz BB20_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: v_mov_b32_e32 v0, 5		; GFX8-NEXT: v_mov_b32_e32 v0, 5
; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
Show All 18 Lines
; GFX8-NEXT: s_mov_b32 s3, 0xf000		; GFX8-NEXT: s_mov_b32 s3, 0xf000
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: min_i64_constant:		; GFX9-LABEL: min_i64_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX9-NEXT: s_cbranch_execz BB20_2		; GFX9-NEXT: s_cbranch_execz BB20_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: v_mov_b32_e32 v0, 5		; GFX9-NEXT: v_mov_b32_e32 v0, 5
; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
Show All 16 Lines
; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc		; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: min_i64_constant:		; GFX1064-LABEL: min_i64_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX1064-NEXT: s_cbranch_execz BB20_2		; GFX1064-NEXT: s_cbranch_execz BB20_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: v_mov_b32_e32 v0, 5		; GFX1064-NEXT: v_mov_b32_e32 v0, 5
; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX1064-NEXT: v_cndmask_b32_e64 v0, v0, s4, vcc		; GFX1064-NEXT: v_cndmask_b32_e64 v0, v0, s4, vcc
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: min_i64_constant:		; GFX1032-LABEL: min_i64_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB20_2		; GFX1032-NEXT: s_cbranch_execz BB20_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: v_mov_b32_e32 v0, 5		; GFX1032-NEXT: v_mov_b32_e32 v0, 5
; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
Show All 19 Lines
; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw min i64 addrspace(3)* @local_var64, i64 5 acq_rel		%old = atomicrmw min i64 addrspace(3)* @local_var64, i64 5 acq_rel
store i64 %old, i64 addrspace(1)* %out		store i64 %old, i64 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_max_rtn_u32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @umax_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @umax_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: umax_i32_varying:		; GFX7LESS-LABEL: umax_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_max_rtn_u32 v0, v1, v0		; GFX7LESS-NEXT: ds_max_rtn_u32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: umax_i32_varying:		; GFX8-LABEL: umax_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX8-NEXT: s_mov_b64 s[2:3], exec
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[4:5]
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, 0		; GFX8-NEXT: v_mov_b32_e32 v2, 0
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX8-NEXT: s_nop 1		; GFX8-NEXT: s_nop 1
Show All 32 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: umax_i32_varying:		; GFX9-LABEL: umax_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX9-NEXT: s_mov_b64 s[2:3], exec
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[4:5]
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, 0		; GFX9-NEXT: v_mov_b32_e32 v2, 0
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX9-NEXT: s_nop 1		; GFX9-NEXT: s_nop 1
Show All 31 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: umax_i32_varying:		; GFX1064-LABEL: umax_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1064-NEXT: s_mov_b64 s[2:3], exec
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, 0		; GFX1064-NEXT: v_mov_b32_e32 v2, 0
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 38 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: umax_i32_varying:		; GFX1032-LABEL: umax_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
		; GFX1032-NEXT: s_mov_b32 s2, exec_lo
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s3
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, 0		; GFX1032-NEXT: v_mov_b32_e32 v2, 0
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
; GFX1032-NEXT: s_mov_b32 s2, -1		; GFX1032-NEXT: s_mov_b32 s2, -1
; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0		; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
Show All 39 Lines
}		}

define amdgpu_kernel void @umax_i64_constant(i64 addrspace(1)* %out) {		define amdgpu_kernel void @umax_i64_constant(i64 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: umax_i64_constant:		; GFX7LESS-LABEL: umax_i64_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, exec_hi, v0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s3, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB22_2		; GFX7LESS-NEXT: s_cbranch_execz BB22_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5		; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5
; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0		; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: umax_i64_constant:		; GFX8-LABEL: umax_i64_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX8-NEXT: s_cbranch_execz BB22_2		; GFX8-NEXT: s_cbranch_execz BB22_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: v_mov_b32_e32 v0, 5		; GFX8-NEXT: v_mov_b32_e32 v0, 5
; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX8-NEXT: s_mov_b32 s2, -1		; GFX8-NEXT: s_mov_b32 s2, -1
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: umax_i64_constant:		; GFX9-LABEL: umax_i64_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX9-NEXT: s_cbranch_execz BB22_2		; GFX9-NEXT: s_cbranch_execz BB22_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: v_mov_b32_e32 v0, 5		; GFX9-NEXT: v_mov_b32_e32 v0, 5
; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
Show All 15 Lines
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_mov_b32 s2, -1		; GFX9-NEXT: s_mov_b32 s2, -1
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: umax_i64_constant:		; GFX1064-LABEL: umax_i64_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX1064-NEXT: s_cbranch_execz BB22_2		; GFX1064-NEXT: s_cbranch_execz BB22_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: v_mov_b32_e32 v0, 5		; GFX1064-NEXT: v_mov_b32_e32 v0, 5
; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX1064-NEXT: v_cndmask_b32_e64 v1, 0, s5, vcc		; GFX1064-NEXT: v_cndmask_b32_e64 v1, 0, s5, vcc
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: umax_i64_constant:		; GFX1032-LABEL: umax_i64_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB22_2		; GFX1032-NEXT: s_cbranch_execz BB22_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: v_mov_b32_e32 v0, 5		; GFX1032-NEXT: v_mov_b32_e32 v0, 5
; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
Show All 19 Lines
; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1032-NEXT: s_endpgm		; GFX1032-NEXT: s_endpgm
entry:		entry:
%old = atomicrmw umax i64 addrspace(3)* @local_var64, i64 5 acq_rel		%old = atomicrmw umax i64 addrspace(3)* @local_var64, i64 5 acq_rel
store i64 %old, i64 addrspace(1)* %out		store i64 %old, i64 addrspace(1)* %out
ret void		ret void
}		}

; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
; GFX8MORE: ds_min_rtn_u32 v{{[0-9]+}}, v{{[0-9]+}}, v[[value]]
define amdgpu_kernel void @umin_i32_varying(i32 addrspace(1)* %out) {		define amdgpu_kernel void @umin_i32_varying(i32 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: umin_i32_varying:		; GFX7LESS-LABEL: umin_i32_varying:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v1, local_var32@abs32@lo
; GFX7LESS-NEXT: s_mov_b32 m0, -1		; GFX7LESS-NEXT: s_mov_b32 m0, -1
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: ds_min_rtn_u32 v0, v1, v0		; GFX7LESS-NEXT: ds_min_rtn_u32 v0, v1, v0
; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7LESS-NEXT: buffer_wbinvl1		; GFX7LESS-NEXT: buffer_wbinvl1
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_mov_b32 s2, -1		; GFX7LESS-NEXT: s_mov_b32 s2, -1
; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: umin_i32_varying:		; GFX8-LABEL: umin_i32_varying:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX8-NEXT: v_mov_b32_e32 v2, v0		; GFX8-NEXT: v_mov_b32_e32 v2, v0
; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX8-NEXT: v_mov_b32_e32 v1, -1		; GFX8-NEXT: v_mov_b32_e32 v1, -1
; GFX8-NEXT: s_mov_b64 exec, s[2:3]		; GFX8-NEXT: s_mov_b64 exec, s[2:3]
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: v_mov_b32_e32 v2, -1		; GFX8-NEXT: v_mov_b32_e32 v2, -1
; GFX8-NEXT: s_not_b64 exec, exec		; GFX8-NEXT: s_not_b64 exec, exec
; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 34 Lines
; GFX8-NEXT: s_nop 0		; GFX8-NEXT: s_nop 0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: umin_i32_varying:		; GFX9-LABEL: umin_i32_varying:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, s3, v3
; GFX9-NEXT: v_mov_b32_e32 v2, v0		; GFX9-NEXT: v_mov_b32_e32 v2, v0
; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX9-NEXT: v_mov_b32_e32 v1, -1		; GFX9-NEXT: v_mov_b32_e32 v1, -1
; GFX9-NEXT: s_mov_b64 exec, s[2:3]		; GFX9-NEXT: s_mov_b64 exec, s[2:3]
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: v_mov_b32_e32 v2, -1		; GFX9-NEXT: v_mov_b32_e32 v2, -1
; GFX9-NEXT: s_not_b64 exec, exec		; GFX9-NEXT: s_not_b64 exec, exec
; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
Show All 33 Lines
; GFX9-NEXT: s_nop 0		; GFX9-NEXT: s_nop 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: umin_i32_varying:		; GFX1064-LABEL: umin_i32_varying:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1064-NEXT: v_mov_b32_e32 v2, v0		; GFX1064-NEXT: v_mov_b32_e32 v2, v0
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, s3, v4
; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
; GFX1064-NEXT: v_mov_b32_e32 v1, -1		; GFX1064-NEXT: v_mov_b32_e32 v1, -1
; GFX1064-NEXT: s_mov_b64 exec, s[2:3]		; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: v_mov_b32_e32 v2, -1		; GFX1064-NEXT: v_mov_b32_e32 v2, -1
; GFX1064-NEXT: s_not_b64 exec, exec		; GFX1064-NEXT: s_not_b64 exec, exec
; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1		; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
; GFX1064-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1064-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
Show All 39 Lines
; GFX1064-NEXT: s_nop 1		; GFX1064-NEXT: s_nop 1
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: umin_i32_varying:		; GFX1032-LABEL: umin_i32_varying:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mov_b32_e32 v2, v0		; GFX1032-NEXT: v_mov_b32_e32 v2, v0
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, s2, 0
; GFX1032-NEXT: s_or_saveexec_b32 s2, -1		; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
; GFX1032-NEXT: v_mov_b32_e32 v1, -1		; GFX1032-NEXT: v_mov_b32_e32 v1, -1
; GFX1032-NEXT: s_mov_b32 exec_lo, s2		; GFX1032-NEXT: s_mov_b32 exec_lo, s2
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: v_mov_b32_e32 v2, -1		; GFX1032-NEXT: v_mov_b32_e32 v2, -1
; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo		; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
; GFX1032-NEXT: s_or_saveexec_b32 s4, -1		; GFX1032-NEXT: s_or_saveexec_b32 s4, -1
; GFX1032-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf		; GFX1032-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
}		}

define amdgpu_kernel void @umin_i64_constant(i64 addrspace(1)* %out) {		define amdgpu_kernel void @umin_i64_constant(i64 addrspace(1)* %out) {
;		;
;		;
; GFX7LESS-LABEL: umin_i64_constant:		; GFX7LESS-LABEL: umin_i64_constant:
; GFX7LESS: ; %bb.0: ; %entry		; GFX7LESS: ; %bb.0: ; %entry
; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9		; GFX7LESS-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
; GFX7LESS-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX7LESS-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, exec_hi, v0
; GFX7LESS-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s3, v0
; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX7LESS-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX7LESS-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX7LESS-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX7LESS-NEXT: s_cbranch_execz BB24_2		; GFX7LESS-NEXT: s_cbranch_execz BB24_2
; GFX7LESS-NEXT: ; %bb.1:		; GFX7LESS-NEXT: ; %bb.1:
; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX7LESS-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5		; GFX7LESS-NEXT: v_mov_b32_e32 v0, 5
; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0		; GFX7LESS-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000		; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)		; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX7LESS-NEXT: s_endpgm		; GFX7LESS-NEXT: s_endpgm
;		;
; GFX8-LABEL: umin_i64_constant:		; GFX8-LABEL: umin_i64_constant:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX8-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX8-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX8-NEXT: s_cbranch_execz BB24_2		; GFX8-NEXT: s_cbranch_execz BB24_2
; GFX8-NEXT: ; %bb.1:		; GFX8-NEXT: ; %bb.1:
; GFX8-NEXT: v_mov_b32_e32 v0, 5		; GFX8-NEXT: v_mov_b32_e32 v0, 5
; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX8-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX8-NEXT: v_mov_b32_e32 v1, 0		; GFX8-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX8-NEXT: s_mov_b32 s3, 0xf000		; GFX8-NEXT: s_mov_b32 s3, 0xf000
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: umin_i64_constant:		; GFX9-LABEL: umin_i64_constant:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX9-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0		; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0		; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX9-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX9-NEXT: s_cbranch_execz BB24_2		; GFX9-NEXT: s_cbranch_execz BB24_2
; GFX9-NEXT: ; %bb.1:		; GFX9-NEXT: ; %bb.1:
; GFX9-NEXT: v_mov_b32_e32 v0, 5		; GFX9-NEXT: v_mov_b32_e32 v0, 5
; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX9-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX9-NEXT: v_mov_b32_e32 v1, 0		; GFX9-NEXT: v_mov_b32_e32 v1, 0
Show All 15 Lines
; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc		; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
; GFX9-NEXT: s_mov_b32 s3, 0xf000		; GFX9-NEXT: s_mov_b32 s3, 0xf000
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX1064-LABEL: umin_i64_constant:		; GFX1064-LABEL: umin_i64_constant:
; GFX1064: ; %bb.0: ; %entry		; GFX1064: ; %bb.0: ; %entry
; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0		; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0		; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0		; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1064-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc		; GFX1064-NEXT: s_and_saveexec_b64 s[2:3], vcc
; GFX1064-NEXT: s_cbranch_execz BB24_2		; GFX1064-NEXT: s_cbranch_execz BB24_2
; GFX1064-NEXT: ; %bb.1:		; GFX1064-NEXT: ; %bb.1:
; GFX1064-NEXT: v_mov_b32_e32 v0, 5		; GFX1064-NEXT: v_mov_b32_e32 v0, 5
; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1064-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1064-NEXT: v_mov_b32_e32 v1, 0		; GFX1064-NEXT: v_mov_b32_e32 v1, 0
Show All 17 Lines
; GFX1064-NEXT: v_cndmask_b32_e64 v0, v0, s4, vcc		; GFX1064-NEXT: v_cndmask_b32_e64 v0, v0, s4, vcc
; GFX1064-NEXT: s_waitcnt lgkmcnt(0)		; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0		; GFX1064-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
; GFX1064-NEXT: s_endpgm		; GFX1064-NEXT: s_endpgm
;		;
; GFX1032-LABEL: umin_i64_constant:		; GFX1032-LABEL: umin_i64_constant:
; GFX1032: ; %bb.0: ; %entry		; GFX1032: ; %bb.0: ; %entry
; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GFX1032-NEXT: v_cmp_ne_u32_e64 s2, 1, 0		; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
; GFX1032-NEXT: ; implicit-def: $vcc_hi		; GFX1032-NEXT: ; implicit-def: $vcc_hi
; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0		; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1		; GFX1032-NEXT: ; implicit-def: $vgpr0_vgpr1
; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo		; GFX1032-NEXT: s_and_saveexec_b32 s2, vcc_lo
; GFX1032-NEXT: s_cbranch_execz BB24_2		; GFX1032-NEXT: s_cbranch_execz BB24_2
; GFX1032-NEXT: ; %bb.1:		; GFX1032-NEXT: ; %bb.1:
; GFX1032-NEXT: v_mov_b32_e32 v0, 5		; GFX1032-NEXT: v_mov_b32_e32 v0, 5
; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo		; GFX1032-NEXT: v_mov_b32_e32 v2, local_var64@abs32@lo
; GFX1032-NEXT: v_mov_b32_e32 v1, 0		; GFX1032-NEXT: v_mov_b32_e32 v1, 0
Show All 26 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

	Show All 13 Lines
	define amdgpu_ps void @add_i32_constant(<4 x i32> inreg %out, <4 x i32> inreg %inout) {			define amdgpu_ps void @add_i32_constant(<4 x i32> inreg %out, <4 x i32> inreg %inout) {
	; GFX7-LABEL: add_i32_constant:			; GFX7-LABEL: add_i32_constant:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_mov_b64 s[10:11], exec			; GFX7-NEXT: s_mov_b64 s[10:11], exec
	; GFX7-NEXT: ; implicit-def: $vgpr0			; GFX7-NEXT: ; implicit-def: $vgpr0
	; GFX7-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX7-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX7-NEXT: s_cbranch_execz BB0_4			; GFX7-NEXT: s_cbranch_execz BB0_4
	; GFX7-NEXT: ; %bb.1:			; GFX7-NEXT: ; %bb.1:
	; GFX7-NEXT: v_cmp_ne_u32_e64 s[12:13], 1, 0			; GFX7-NEXT: s_mov_b64 s[12:13], exec
	; GFX7-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s12, 0			; GFX7-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s12, 0
	; GFX7-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s13, v0			; GFX7-NEXT: v_mbcnt_hi_u32_b32_e32 v0, s13, v0
	; GFX7-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX7-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX7-NEXT: ; implicit-def: $vgpr1			; GFX7-NEXT: ; implicit-def: $vgpr1
	; GFX7-NEXT: s_and_saveexec_b64 s[10:11], vcc			; GFX7-NEXT: s_and_saveexec_b64 s[10:11], vcc
	; GFX7-NEXT: s_cbranch_execz BB0_3			; GFX7-NEXT: s_cbranch_execz BB0_3
	; GFX7-NEXT: ; %bb.2:			; GFX7-NEXT: ; %bb.2:
	; GFX7-NEXT: s_bcnt1_i32_b64 s12, s[12:13]			; GFX7-NEXT: s_bcnt1_i32_b64 s12, s[12:13]
	Show All 16 Lines
	;			;
	; GFX8-LABEL: add_i32_constant:			; GFX8-LABEL: add_i32_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_mov_b64 s[10:11], exec			; GFX8-NEXT: s_mov_b64 s[10:11], exec
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX8-NEXT: s_cbranch_execz BB0_4			; GFX8-NEXT: s_cbranch_execz BB0_4
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_cmp_ne_u32_e64 s[12:13], 1, 0			; GFX8-NEXT: s_mov_b64 s[12:13], exec
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s12, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s12, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s13, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s13, v0
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr1			; GFX8-NEXT: ; implicit-def: $vgpr1
	; GFX8-NEXT: s_and_saveexec_b64 s[10:11], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[10:11], vcc
	; GFX8-NEXT: s_cbranch_execz BB0_3			; GFX8-NEXT: s_cbranch_execz BB0_3
	; GFX8-NEXT: ; %bb.2:			; GFX8-NEXT: ; %bb.2:
	; GFX8-NEXT: s_bcnt1_i32_b64 s12, s[12:13]			; GFX8-NEXT: s_bcnt1_i32_b64 s12, s[12:13]
	Show All 16 Lines
	;			;
	; GFX9-LABEL: add_i32_constant:			; GFX9-LABEL: add_i32_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_mov_b64 s[10:11], exec			; GFX9-NEXT: s_mov_b64 s[10:11], exec
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX9-NEXT: s_cbranch_execz BB0_4			; GFX9-NEXT: s_cbranch_execz BB0_4
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_cmp_ne_u32_e64 s[12:13], 1, 0			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s12, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s12, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s13, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s13, v0
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr1			; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: s_and_saveexec_b64 s[10:11], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[10:11], vcc
	; GFX9-NEXT: s_cbranch_execz BB0_3			; GFX9-NEXT: s_cbranch_execz BB0_3
	; GFX9-NEXT: ; %bb.2:			; GFX9-NEXT: ; %bb.2:
	; GFX9-NEXT: s_bcnt1_i32_b64 s12, s[12:13]			; GFX9-NEXT: s_bcnt1_i32_b64 s12, s[12:13]
	Show All 16 Lines
	;			;
	; GFX1064-LABEL: add_i32_constant:			; GFX1064-LABEL: add_i32_constant:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_mov_b64 s[10:11], exec			; GFX1064-NEXT: s_mov_b64 s[10:11], exec
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX1064-NEXT: s_cbranch_execz BB0_4			; GFX1064-NEXT: s_cbranch_execz BB0_4
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_cmp_ne_u32_e64 s[12:13], 1, 0			; GFX1064-NEXT: s_mov_b64 s[12:13], exec
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s12, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s12, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s13, v0			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s13, v0
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1064-NEXT: s_and_saveexec_b64 s[30:31], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[30:31], vcc
	; GFX1064-NEXT: s_cbranch_execz BB0_3			; GFX1064-NEXT: s_cbranch_execz BB0_3
	; GFX1064-NEXT: ; %bb.2:			; GFX1064-NEXT: ; %bb.2:
	; GFX1064-NEXT: s_bcnt1_i32_b64 s12, s[12:13]			; GFX1064-NEXT: s_bcnt1_i32_b64 s12, s[12:13]
	Show All 18 Lines
	; GFX1032-LABEL: add_i32_constant:			; GFX1032-LABEL: add_i32_constant:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_mov_b32 s9, exec_lo			; GFX1032-NEXT: s_mov_b32 s9, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: ; implicit-def: $vcc_hi			; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: s_and_saveexec_b32 s8, s9			; GFX1032-NEXT: s_and_saveexec_b32 s8, s9
	; GFX1032-NEXT: s_cbranch_execz BB0_4			; GFX1032-NEXT: s_cbranch_execz BB0_4
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_cmp_ne_u32_e64 s10, 1, 0			; GFX1032-NEXT: s_mov_b32 s10, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vgpr1			; GFX1032-NEXT: ; implicit-def: $vgpr1
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s10, 0			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s10, 0
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1032-NEXT: s_and_saveexec_b32 s9, vcc_lo			; GFX1032-NEXT: s_and_saveexec_b32 s9, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB0_3			; GFX1032-NEXT: s_cbranch_execz BB0_3
	; GFX1032-NEXT: ; %bb.2:			; GFX1032-NEXT: ; %bb.2:
	; GFX1032-NEXT: s_bcnt1_i32_b32 s10, s10			; GFX1032-NEXT: s_bcnt1_i32_b32 s10, s10
	; GFX1032-NEXT: v_mul_u32_u24_e64 v1, s10, 5			; GFX1032-NEXT: v_mul_u32_u24_e64 v1, s10, 5
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX8-LABEL: add_i32_varying:			; GFX8-LABEL: add_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_mov_b64 s[10:11], exec			; GFX8-NEXT: s_mov_b64 s[10:11], exec
	; GFX8-NEXT: ; implicit-def: $vgpr3			; GFX8-NEXT: ; implicit-def: $vgpr3
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX8-NEXT: s_cbranch_execz BB1_4			; GFX8-NEXT: s_cbranch_execz BB1_4
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX8-NEXT: s_mov_b64 s[10:11], exec
				; GFX8-NEXT: s_or_saveexec_b64 s[12:13], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[10:11]			; GFX8-NEXT: s_mov_b64 exec, s[12:13]
	; GFX8-NEXT: v_cmp_ne_u32_e64 s[10:11], 1, 0
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s10, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s10, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s11, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s11, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	Show All 37 Lines
	; GFX9-LABEL: add_i32_varying:			; GFX9-LABEL: add_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_mov_b64 s[10:11], exec			; GFX9-NEXT: s_mov_b64 s[10:11], exec
	; GFX9-NEXT: ; implicit-def: $vgpr3			; GFX9-NEXT: ; implicit-def: $vgpr3
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX9-NEXT: s_cbranch_execz BB1_4			; GFX9-NEXT: s_cbranch_execz BB1_4
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX9-NEXT: s_mov_b64 s[10:11], exec
				; GFX9-NEXT: s_or_saveexec_b64 s[12:13], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-NEXT: s_mov_b64 exec, s[12:13]
	; GFX9-NEXT: v_cmp_ne_u32_e64 s[10:11], 1, 0
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s10, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s10, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s11, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s11, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	Show All 37 Lines
	; GFX1064-LABEL: add_i32_varying:			; GFX1064-LABEL: add_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_mov_b64 s[10:11], exec			; GFX1064-NEXT: s_mov_b64 s[10:11], exec
	; GFX1064-NEXT: ; implicit-def: $vgpr4			; GFX1064-NEXT: ; implicit-def: $vgpr4
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0			; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX1064-NEXT: s_cbranch_execz BB1_4			; GFX1064-NEXT: s_cbranch_execz BB1_4
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX1064-NEXT: s_mov_b64 s[10:11], exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[12:13], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[10:11]			; GFX1064-NEXT: s_mov_b64 exec, s[12:13]
	; GFX1064-NEXT: v_cmp_ne_u32_e64 s[10:11], 1, 0
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s10, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s10, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s11, v0			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s11, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v2, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_mov_b32 s9, exec_lo			; GFX1032-NEXT: s_mov_b32 s9, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vgpr4			; GFX1032-NEXT: ; implicit-def: $vgpr4
	; GFX1032-NEXT: ; implicit-def: $vcc_hi			; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0			; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_and_saveexec_b32 s8, s9			; GFX1032-NEXT: s_and_saveexec_b32 s8, s9
	; GFX1032-NEXT: s_cbranch_execz BB1_4			; GFX1032-NEXT: s_cbranch_execz BB1_4
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: s_or_saveexec_b32 s9, -1			; GFX1032-NEXT: s_mov_b32 s9, exec_lo
				; GFX1032-NEXT: s_or_saveexec_b32 s10, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s9			; GFX1032-NEXT: s_mov_b32 exec_lo, s10
	; GFX1032-NEXT: v_cmp_ne_u32_e64 s9, 1, 0
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s9, 0			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s9, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s9, -1			; GFX1032-NEXT: s_or_saveexec_b32 s9, -1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32)			declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32)
	declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32)			declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32)

	; Show that what the atomic optimization pass will do for raw buffers.			; Show what the atomic optimization pass will do for raw buffers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: buffer_atomic_add v[[value]]			; GCN: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i32_uniform:			; GCN-LABEL: add_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: buffer_atomic_add v[[value]]			; GCN: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %additive) {			define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %additive) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %additive, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %additive, <4 x i32> %inout, i32 0, i32 0, i32 0)
	Show All 29 Lines
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 1, <4 x i32> %inout, i32 %lane, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 1, <4 x i32> %inout, i32 %lane, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_constant:			; GCN-LABEL: sub_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: buffer_atomic_sub v[[value]]			; GCN: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_uniform:			; GCN-LABEL: sub_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: buffer_atomic_sub v[[value]]			; GCN: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %subitive) {			define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %subitive) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %subitive, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %subitive, <4 x i32> %inout, i32 0, i32 0, i32 0)
	Show All 35 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32)			declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32)
	declare i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32, i32)			declare i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32, i32)

	; Show that what the atomic optimization pass will do for struct buffers.			; Show what the atomic optimization pass will do for struct buffers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: buffer_atomic_add v[[value]]			; GCN: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @add_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: add_i32_uniform:			; GCN-LABEL: add_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: buffer_atomic_add v[[value]]			; GCN: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %additive) {			define amdgpu_kernel void @add_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %additive) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 %additive, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 %additive, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 1, <4 x i32> %inout, i32 0, i32 %lane, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 1, <4 x i32> %inout, i32 0, i32 %lane, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_constant:			; GCN-LABEL: sub_i32_constant:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5			; GCN: v_mul_u32_u24{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[popcount]], 5
	; GCN: buffer_atomic_sub v[[value]]			; GCN: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @sub_i32_constant(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32 5, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: sub_i32_uniform:			; GCN-LABEL: sub_i32_uniform:
	; GCN32: v_cmp_ne_u32_e64 s[[exec_lo:[0-9]+]], 1, 0			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: v_cmp_ne_u32_e64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, 1, 0			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0			; GCN: v_mbcnt_lo_u32_b32{{(_e[0-9]+)?}} v[[mbcnt:[0-9]+]], s[[exec_lo]], 0
	; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]			; GCN64: v_mbcnt_hi_u32_b32{{(_e[0-9]+)?}} v[[mbcnt]], s[[exec_hi]], v[[mbcnt]]
	; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]			; GCN: v_cmp_eq_u32{{(_e[0-9]+)?}} vcc{{(_lo)?}}, 0, v[[mbcnt]]
				; GCN: s_and_saveexec_b{{32\|64}} s[[exec:\[?[0-9:]+\]?]], vcc
	; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]			; GCN32: s_bcnt1_i32_b32 s[[popcount:[0-9]+]], s[[exec_lo]]
	; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}			; GCN64: s_bcnt1_i32_b64 s[[popcount:[0-9]+]], s{{\[}}[[exec_lo]]:[[exec_hi]]{{\]}}
	; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]			; GCN: s_mul_i32 s[[scalar_value:[0-9]+]], s{{[0-9]+}}, s[[popcount]]
	; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GCN: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
	; GCN: buffer_atomic_sub v[[value]]			; GCN: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %subitive) {			define amdgpu_kernel void @sub_i32_uniform(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %subitive) {
	entry:			entry:
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32 %subitive, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32 %subitive, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ballot.i32(i1)

				; Test ballot(0)

				define i32 @test0() {
				; CHECK-LABEL: test0:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 0)
				ret i32 %ballot
				}

				; Test ballot(1)

				define i32 @test1() {
				; CHECK-LABEL: test1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
				; CHECK-NEXT: v_mov_b32_e32 v0, exec_lo
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 1)
				ret i32 %ballot
				}

				; Test ballot of a non-comparison operation

				define i32 @test2(i32 %x) {
				; CHECK-LABEL: test2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
				; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: v_cmp_ne_u32_e64 s4, 0, v0
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%trunc = trunc i32 %x to i1
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %trunc)
				ret i32 %ballot
				}

				; Test ballot of comparisons

				define i32 @test3(i32 %x, i32 %y) {
				; CHECK-LABEL: test3:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
				; CHECK-NEXT: v_cmp_eq_u32_e64 s4, v0, v1
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%cmp = icmp eq i32 %x, %y
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
				ret i32 %ballot
				}

				define i32 @test4(i32 %x) {
				; CHECK-LABEL: test4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
				; CHECK-NEXT: v_cmp_lt_i32_e64 s4, 0x62, v0
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%cmp = icmp sge i32 %x, 99
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
				ret i32 %ballot
				}

				define i32 @test5(float %x, float %y) {
				; CHECK-LABEL: test5:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
				; CHECK-NEXT: v_cmp_gt_f32_e64 s4, v0, v1
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%cmp = fcmp ogt float %x, %y
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
				ret i32 %ballot
				}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 < %s \| FileCheck %s

				declare i64 @llvm.amdgcn.ballot.i64(i1)

				; Test ballot(0)

				define i64 @test0() {
				; CHECK-LABEL: test0:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: v_mov_b32_e32 v1, 0
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 0)
				ret i64 %ballot
				}

				; Test ballot(1)

				define i64 @test1() {
				; CHECK-LABEL: test1:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: v_mov_b32_e32 v0, exec_lo
				; CHECK-NEXT: v_mov_b32_e32 v1, exec_hi
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 1)
				ret i64 %ballot
				}

				; Test ballot of a non-comparison operation

				define i64 @test2(i32 %x) {
				; CHECK-LABEL: test2:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
				; CHECK-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, v0
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: v_mov_b32_e32 v1, s5
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%trunc = trunc i32 %x to i1
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
				ret i64 %ballot
				}

				; Test ballot of comparisons

				define i64 @test3(i32 %x, i32 %y) {
				; CHECK-LABEL: test3:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], v0, v1
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: v_mov_b32_e32 v1, s5
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%cmp = icmp eq i32 %x, %y
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				ret i64 %ballot
				}

				define i64 @test4(i32 %x) {
				; CHECK-LABEL: test4:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: s_movk_i32 s4, 0x62
				; CHECK-NEXT: v_cmp_lt_i32_e64 s[4:5], s4, v0
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: v_mov_b32_e32 v1, s5
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%cmp = icmp sge i32 %x, 99
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				ret i64 %ballot
				}

				define i64 @test5(float %x, float %y) {
				; CHECK-LABEL: test5:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: v_cmp_gt_f32_e64 s[4:5], v0, v1
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: v_mov_b32_e32 v1, s5
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				%cmp = fcmp ogt float %x, %y
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				ret i64 %ballot
				}

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	Show First 20 Lines • Show All 2,373 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[X:%.]], float 4.000000e+00, i32 2)			; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[X:%.]], float 4.000000e+00, i32 2)
	; CHECK-NEXT: ret i64 [[RESULT]]			; CHECK-NEXT: ret i64 [[RESULT]]
	;			;
	%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 4.0, float %x, i32 4)			%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 4.0, float %x, i32 4)
	ret i64 %result			ret i64 %result
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
				; llvm.amdgcn.ballot
				; --------------------------------------------------------------------

				declare i64 @llvm.amdgcn.ballot.i64(i1) nounwind readnone convergent
				declare i32 @llvm.amdgcn.ballot.i32(i1) nounwind readnone convergent

				define i64 @ballot_nocombine_64(i1 %i) {
				; CHECK-LABEL: @ballot_nocombine_64(
				; CHECK-NEXT: %b = call i64 @llvm.amdgcn.ballot.i64(i1 %i)
				; CHECK-NEXT: ret i64 %b
				;
				%b = call i64 @llvm.amdgcn.ballot.i64(i1 %i)
				ret i64 %b
				}

				define i64 @ballot_zero_64() {
				; CHECK-LABEL: @ballot_zero_64(
				; CHECK-NEXT: ret i64 0
				;
				%b = call i64 @llvm.amdgcn.ballot.i64(i1 0)
				ret i64 %b
				}

				define i64 @ballot_one_64() {
				; CHECK-LABEL: @ballot_one_64(
				; CHECK-NEXT: %b = call i64 @llvm.read_register.i64(metadata !0) [[CONVERGENT]]
				; CHECK-NEXT: ret i64 %b
				;
				%b = call i64 @llvm.amdgcn.ballot.i64(i1 1)
				ret i64 %b
				}
				arsenmUnsubmitted Not Done Reply Inline Actions Wave32 tests also arsenm: Wave32 tests also

				define i32 @ballot_nocombine_32(i1 %i) {
				; CHECK-LABEL: @ballot_nocombine_32(
				; CHECK-NEXT: %b = call i32 @llvm.amdgcn.ballot.i32(i1 %i)
				; CHECK-NEXT: ret i32 %b
				;
				%b = call i32 @llvm.amdgcn.ballot.i32(i1 %i)
				ret i32 %b
				}

				define i32 @ballot_zero_32() {
				; CHECK-LABEL: @ballot_zero_32(
				; CHECK-NEXT: ret i32 0
				;
				%b = call i32 @llvm.amdgcn.ballot.i32(i1 0)
				ret i32 %b
				}

				define i32 @ballot_one_32() {
				; CHECK-LABEL: @ballot_one_32(
				; CHECK-NEXT: %b = call i32 @llvm.read_register.i32(metadata !1) [[CONVERGENT]]
				; CHECK-NEXT: ret i32 %b
				;
				%b = call i32 @llvm.amdgcn.ballot.i32(i1 1)
				ret i32 %b
				}

				; --------------------------------------------------------------------
	; llvm.amdgcn.wqm.vote			; llvm.amdgcn.wqm.vote
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i1 @llvm.amdgcn.wqm.vote(i1)			declare i1 @llvm.amdgcn.wqm.vote(i1)

	define float @wqm_vote_true() {			define float @wqm_vote_true() {
	; CHECK-LABEL: @wqm_vote_true(			; CHECK-LABEL: @wqm_vote_true(
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][RFC] New llvm.amdgcn.ballot intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 253569

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

[AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic
ClosedPublic