This is an archive of the discontinued LLVM Phabricator instance.

Differential D124387

AMDGPU: Fold out readfirstlane between vgpr to vgpr copies
Needs ReviewPublic

Authored by arsenm on Apr 25 2022, 6:54 AM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

Restricted Project

Summary

We were handling the SGPR->VGPR->SGPR case. This extends to handle
VGPR->SGPR->VGPR cases if exec wasn't modified between the use and
def. This cleans up a few cases used to assert uniformity if that
turned out to not be helpful.

Diff Detail

Event Timeline

arsenm created this revision.Apr 25 2022, 6:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 6:54 AM

Herald added subscribers: hsmhsm, foad, kerbowa and 8 others. · View Herald Transcript

arsenm requested review of this revision.Apr 25 2022, 6:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 6:54 AM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added a parent revision: D124385: AMDGPU: Special case divergence analysis for wave ID computation.Apr 25 2022, 6:54 AM

Harbormaster completed remote builds in B161155: Diff 424894.Apr 25 2022, 6:55 AM

foad added inline comments.Apr 25 2022, 7:27 AM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case. I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it.

b-sumner added a subscriber: b-sumner.Apr 25 2022, 9:42 AM

b-sumner added inline comments.

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	We use readfirstlane to "elect" a value from the currently active lines. The argument is likely not uniform, and breaking such code would be problematic.

arsenm added inline comments.Apr 25 2022, 1:11 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	I thought this was wrong at first but don't see where the problem is. If you're reading the value back into a VGPR with the same exec mask at a later point, where is the difference? At the copy to VGPR, you're copying the from the same lane

b-sumner added inline comments.Apr 25 2022, 1:29 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to all other active lanes.

foad added inline comments.Apr 25 2022, 1:32 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	In the original code, every lane gets the same value in %2. If you remove the readfirstlane, they might get different values (if %0 is non-uniform).

arsenm added inline comments.Apr 25 2022, 1:33 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	It's broadcast to a scalar value, but as soon as you have a vector use, it's reduced down to the active lanes again. It's only uniform for scalar uses

arsenm added inline comments.Apr 25 2022, 1:36 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	Oh, I see here. I'm checking the wrong exec def point. I need to check exec at the point the readfirstlane source is defined, not the readfirstlane itself

foad added inline comments.Apr 25 2022, 1:47 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1858	Even if exec doesn't change throughout the whole program, it's still semantically wrong to remove a readfirstlane like this, unless you can prove independently that the input to the readfirstlane is uniform.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFoldOperands.cpp

34 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

atomic_optimizations_mul_one.ll

11 lines

atomic_optimizations_buffer.ll

9 lines

atomic_optimizations_global_pointer.ll

82 lines

atomic_optimizations_local_pointer.ll

99 lines

atomic_optimizations_pixelshader.ll

6 lines

atomic_optimizations_raw_buffer.ll

9 lines

atomic_optimizations_struct_buffer.ll

9 lines

fold-readlane.mir

195 lines

llvm.amdgcn.readfirstlane.ll

13 lines

wave-id-computation.ll

3 lines

Diff 424894

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 1,842 Lines • ▼ Show 20 Lines	for (auto &MI : make_early_inc_range(*MBB)) {

// FIXME: We could also be folding things like TargetIndexes.		// FIXME: We could also be folding things like TargetIndexes.
if (!FoldingImm && !OpToFold.isReg())		if (!FoldingImm && !OpToFold.isReg())
continue;		continue;

if (OpToFold.isReg() && !OpToFold.getReg().isVirtual())		if (OpToFold.isReg() && !OpToFold.getReg().isVirtual())
continue;		continue;

		if (OpToFold.isReg()) {
		// Fold vgpr to vgpr copy with an intermediate readfirstlane
		//
		// %0:vgpr_32 = COPY $vgpr0
		// %1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		// %2:vgpr_32 = COPY %1
		//
		// => %2 = COPY %0
		foadUnsubmitted Not Done Reply Inline Actions This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has suggested introducing a "readanylane" pseudo and/or intrinsic for that kind of use case. I'm not sure if there is any existing code that deliberately uses readfirstlane on a non-uniform argument, but if there is then this will break it. foad: This transformation only makes sense if you know that %0 is uniform. I think @nhaehnle has…
		b-sumnerUnsubmitted Not Done Reply Inline Actions We use readfirstlane to "elect" a value from the currently active lines. The argument is likely not uniform, and breaking such code would be problematic. b-sumner: We use readfirstlane to "elect" a value from the currently active lines. The argument is…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I thought this was wrong at first but don't see where the problem is. If you're reading the value back into a VGPR with the same exec mask at a later point, where is the difference? At the copy to VGPR, you're copying the from the same lane arsenm: I thought this was wrong at first but don't see where the problem is. If you're reading the…
		b-sumnerUnsubmitted Not Done Reply Inline Actions This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to all other active lanes. b-sumner: This use of readfirstlane is broadcasting the value in the elected lane (.e. the first lane) to…
		arsenmAuthorUnsubmitted Done Reply Inline Actions It's broadcast to a scalar value, but as soon as you have a vector use, it's reduced down to the active lanes again. It's only uniform for scalar uses arsenm: It's broadcast to a scalar value, but as soon as you have a vector use, it's reduced down to…
		foadUnsubmitted Not Done Reply Inline Actions In the original code, every lane gets the same value in %2. If you remove the readfirstlane, they might get different values (if %0 is non-uniform). foad: In the original code, every lane gets the same value in %2. If you remove the readfirstlane…
		arsenmAuthorUnsubmitted Done Reply Inline Actions Oh, I see here. I'm checking the wrong exec def point. I need to check exec at the point the readfirstlane source is defined, not the readfirstlane itself arsenm: Oh, I see here. I'm checking the wrong exec def point. I need to check exec at the point the…
		foadUnsubmitted Not Done Reply Inline Actions Even if exec doesn't change throughout the whole program, it's still semantically wrong to remove a readfirstlane like this, unless you can prove independently that the input to the readfirstlane is uniform. foad: Even if exec doesn't change throughout the whole program, it's still semantically wrong to…
		//
		MachineInstr *SrcDef = MRI->getVRegDef(OpToFold.getReg());
		if (SrcDef && SrcDef->getOpcode() == AMDGPU::V_READFIRSTLANE_B32 &&
		MRI->hasOneUse(OpToFold.getReg()) &&
		!TRI->isSGPRReg(*MRI, MI.getOperand(0).getReg())) {
		// TODO: Should also fold through reg_sequence
		if (!execMayBeModifiedBeforeUse(MRI, OpToFold.getReg(), SrcDef,
		MI)) {
		OpToFold.setReg(SrcDef->getOperand(1).getReg());
		OpToFold.setSubReg(SrcDef->getOperand(1).getSubReg());
		SrcDef->eraseFromParent();
		MRI->clearKillFlags(OpToFold.getReg());

		// FIXME: Do we need to make this a convergent move?
		// If this was an ordinary copy, we need to track the exec
		// dependency.
		if (MI.isCopy())
		MI.addOperand(
		MF, MachineOperand::CreateReg(AMDGPU::EXEC, false, true));

		Changed = true;
		continue;
		}
		}
		}

// Prevent folding operands backwards in the function. For example,		// Prevent folding operands backwards in the function. For example,
// the COPY opcode must not be replaced by 1 in this example:		// the COPY opcode must not be replaced by 1 in this example:
//		//
// %3 = COPY %vgpr0; VGPR_32:%3		// %3 = COPY %vgpr0; VGPR_32:%3
// ...		// ...
// %vgpr0 = V_MOV_B32_e32 1, implicit %exec		// %vgpr0 = V_MOV_B32_e32 1, implicit %exec
if (!MI.getOperand(0).getReg().isVirtual())		if (!MI.getOperand(0).getReg().isVirtual())
continue;		continue;
Show All 30 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_optimizations_mul_one.ll

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; GCN-NEXT: ; %bb.1:			; GCN-NEXT: ; %bb.1:
	; GCN-NEXT: s_bcnt1_i32_b64 s6, s[6:7]			; GCN-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v1, s6			; GCN-NEXT: v_mov_b32_e32 v1, s6
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: buffer_atomic_add v1, v2, s[0:3], 0 idxen glc			; GCN-NEXT: buffer_atomic_add v1, v2, s[0:3], 0 idxen glc
	; GCN-NEXT: .LBB1_2:			; GCN-NEXT: .LBB1_2:
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readfirstlane_b32 s4, v1			; GCN-NEXT: v_add_i32_e32 v4, vcc, v1, v0
	; GCN-NEXT: v_add_i32_e32 v4, vcc, s4, v0
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; GCN-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: v_mov_b32_e32 v1, s1			; GCN-NEXT: v_mov_b32_e32 v1, s1
	; GCN-NEXT: v_mov_b32_e32 v2, s2			; GCN-NEXT: v_mov_b32_e32 v2, s2
	; GCN-NEXT: v_mov_b32_e32 v3, s3			; GCN-NEXT: v_mov_b32_e32 v3, s3
	; GCN-NEXT: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen			; GCN-NEXT: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GCN-NEXT: ; %bb.1:			; GCN-NEXT: ; %bb.1:
	; GCN-NEXT: s_bcnt1_i32_b64 s6, s[6:7]			; GCN-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v1, s6			; GCN-NEXT: v_mov_b32_e32 v1, s6
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: buffer_atomic_sub v1, v2, s[0:3], 0 idxen glc			; GCN-NEXT: buffer_atomic_sub v1, v2, s[0:3], 0 idxen glc
	; GCN-NEXT: .LBB3_2:			; GCN-NEXT: .LBB3_2:
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readfirstlane_b32 s4, v1			; GCN-NEXT: v_sub_i32_e32 v4, vcc, v1, v0
	; GCN-NEXT: v_sub_i32_e32 v4, vcc, s4, v0
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; GCN-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: v_mov_b32_e32 v1, s1			; GCN-NEXT: v_mov_b32_e32 v1, s1
	; GCN-NEXT: v_mov_b32_e32 v2, s2			; GCN-NEXT: v_mov_b32_e32 v2, s2
	; GCN-NEXT: v_mov_b32_e32 v3, s3			; GCN-NEXT: v_mov_b32_e32 v3, s3
	; GCN-NEXT: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen			; GCN-NEXT: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; GCN-NEXT: ; %bb.1:			; GCN-NEXT: ; %bb.1:
	; GCN-NEXT: s_bcnt1_i32_b64 s6, s[6:7]			; GCN-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
	; GCN-NEXT: s_and_b32 s6, s6, 1			; GCN-NEXT: s_and_b32 s6, s6, 1
	; GCN-NEXT: v_mov_b32_e32 v1, s6			; GCN-NEXT: v_mov_b32_e32 v1, s6
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: buffer_atomic_xor v1, v2, s[0:3], 0 idxen glc			; GCN-NEXT: buffer_atomic_xor v1, v2, s[0:3], 0 idxen glc
	; GCN-NEXT: .LBB5_2:			; GCN-NEXT: .LBB5_2:
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readfirstlane_b32 s4, v1
	; GCN-NEXT: v_and_b32_e32 v0, 1, v0			; GCN-NEXT: v_and_b32_e32 v0, 1, v0
	; GCN-NEXT: v_xor_b32_e32 v4, s4, v0			; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_xor_b32_e32 v4, v1, v0
	; GCN-NEXT: s_waitcnt expcnt(0)			; GCN-NEXT: s_waitcnt expcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; GCN-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: v_mov_b32_e32 v1, s1			; GCN-NEXT: v_mov_b32_e32 v1, s1
	; GCN-NEXT: v_mov_b32_e32 v2, s2			; GCN-NEXT: v_mov_b32_e32 v2, s2
	; GCN-NEXT: v_mov_b32_e32 v3, s3			; GCN-NEXT: v_mov_b32_e32 v3, s3
	; GCN-NEXT: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen			; GCN-NEXT: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	%a = call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)			%a = call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)
	call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> %arg, <4 x i32> %arg, i32 %a, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> %arg, <4 x i32> %arg, i32 %a, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll

	Show All 27 Lines
	; GFX6-NEXT: s_bcnt1_i32_b64 s0, s[2:3]			; GFX6-NEXT: s_bcnt1_i32_b64 s0, s[2:3]
	; GFX6-NEXT: s_mul_i32 s0, s0, 5			; GFX6-NEXT: s_mul_i32 s0, s0, 5
	; GFX6-NEXT: v_mov_b32_e32 v1, s0			; GFX6-NEXT: v_mov_b32_e32 v1, s0
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX6-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX6-NEXT: .LBB0_2:			; GFX6-NEXT: .LBB0_2:
	; GFX6-NEXT: s_or_b64 exec, exec, s[6:7]			; GFX6-NEXT: s_or_b64 exec, exec, s[6:7]
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readfirstlane_b32 s0, v1			; GFX6-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX6-NEXT: s_mov_b32 s7, 0xf000			; GFX6-NEXT: s_mov_b32 s7, 0xf000
	; GFX6-NEXT: v_mad_u32_u24 v0, v0, 5, s0
	; GFX6-NEXT: s_mov_b32 s6, -1			; GFX6-NEXT: s_mov_b32 s6, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_constant:			; GFX8-LABEL: add_i32_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	Show All 9 Lines
	; GFX8-NEXT: s_bcnt1_i32_b64 s0, s[6:7]			; GFX8-NEXT: s_bcnt1_i32_b64 s0, s[6:7]
	; GFX8-NEXT: s_mul_i32 s0, s0, 5			; GFX8-NEXT: s_mul_i32 s0, s0, 5
	; GFX8-NEXT: v_mov_b32_e32 v1, s0			; GFX8-NEXT: v_mov_b32_e32 v1, s0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX8-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX8-NEXT: .LBB0_2:			; GFX8-NEXT: .LBB0_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s0, v1			; GFX8-NEXT: v_mad_u32_u24 v2, v0, 5, v1
	; GFX8-NEXT: v_mad_u32_u24 v2, v0, 5, s0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_constant:			; GFX9-LABEL: add_i32_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	Show All 10 Lines
	; GFX9-NEXT: s_bcnt1_i32_b64 s0, s[6:7]			; GFX9-NEXT: s_bcnt1_i32_b64 s0, s[6:7]
	; GFX9-NEXT: s_mul_i32 s0, s0, 5			; GFX9-NEXT: s_mul_i32 s0, s0, 5
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX9-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX9-NEXT: .LBB0_2:			; GFX9-NEXT: .LBB0_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s0, v1			; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_store_dword v1, v0, s[2:3]			; GFX9-NEXT: global_store_dword v1, v0, s[2:3]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10W64-LABEL: add_i32_constant:			; GFX10W64-LABEL: add_i32_constant:
	; GFX10W64: ; %bb.0: ; %entry			; GFX10W64: ; %bb.0: ; %entry
	; GFX10W64-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX10W64-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	▲ Show 20 Lines • Show All 1,306 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll

	Show All 32 Lines
	; GFX7LESS-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX7LESS-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX7LESS-NEXT: s_waitcnt vmcnt(0)			; GFX7LESS-NEXT: s_waitcnt vmcnt(0)
	; GFX7LESS-NEXT: buffer_wbinvl1			; GFX7LESS-NEXT: buffer_wbinvl1
	; GFX7LESS-NEXT: .LBB0_2:			; GFX7LESS-NEXT: .LBB0_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v1			; GFX7LESS-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX7LESS-NEXT: v_mad_u32_u24 v0, v0, 5, s4
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX89-LABEL: add_i32_constant:			; GFX89-LABEL: add_i32_constant:
	; GFX89: ; %bb.0: ; %entry			; GFX89: ; %bb.0: ; %entry
	; GFX89-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX89-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX89-NEXT: s_mov_b64 s[6:7], exec			; GFX89-NEXT: s_mov_b64 s[6:7], exec
	; GFX89-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX89-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 12 Lines
	; GFX89-NEXT: s_mov_b32 s9, s3			; GFX89-NEXT: s_mov_b32 s9, s3
	; GFX89-NEXT: v_mov_b32_e32 v1, s2			; GFX89-NEXT: v_mov_b32_e32 v1, s2
	; GFX89-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX89-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX89-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX89-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX89-NEXT: s_waitcnt vmcnt(0)			; GFX89-NEXT: s_waitcnt vmcnt(0)
	; GFX89-NEXT: buffer_wbinvl1_vol			; GFX89-NEXT: buffer_wbinvl1_vol
	; GFX89-NEXT: .LBB0_2:			; GFX89-NEXT: .LBB0_2:
	; GFX89-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX89-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX89-NEXT: v_readfirstlane_b32 s4, v1
	; GFX89-NEXT: s_waitcnt lgkmcnt(0)			; GFX89-NEXT: s_waitcnt lgkmcnt(0)
	; GFX89-NEXT: s_mov_b32 s3, 0xf000			; GFX89-NEXT: s_mov_b32 s3, 0xf000
	; GFX89-NEXT: s_mov_b32 s2, -1			; GFX89-NEXT: s_mov_b32 s2, -1
	; GFX89-NEXT: v_mad_u32_u24 v0, v0, 5, s4			; GFX89-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX89-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX89-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX89-NEXT: s_endpgm			; GFX89-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_constant:			; GFX1064-LABEL: add_i32_constant:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX1064-NEXT: s_mov_b64 s[6:7], exec			; GFX1064-NEXT: s_mov_b64 s[6:7], exec
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	▲ Show 20 Lines • Show All 538 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_waitcnt vmcnt(0)			; GFX7LESS-NEXT: s_waitcnt vmcnt(0)
	; GFX7LESS-NEXT: buffer_wbinvl1			; GFX7LESS-NEXT: buffer_wbinvl1
	; GFX7LESS-NEXT: .LBB3_2:			; GFX7LESS-NEXT: .LBB3_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s5, v1			; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX7LESS-NEXT: s_waitcnt expcnt(0)			; GFX7LESS-NEXT: s_waitcnt expcnt(0)
	; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2
	; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX7LESS-NEXT: v_mov_b32_e32 v2, s5
	; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s4, v0			; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s4, v0
	; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc			; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX89-LABEL: add_i64_constant:			; GFX89-LABEL: add_i64_constant:
	; GFX89: ; %bb.0: ; %entry			; GFX89: ; %bb.0: ; %entry
	; GFX89-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX89-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX89-NEXT: s_mov_b64 s[6:7], exec			; GFX89-NEXT: s_mov_b64 s[6:7], exec
	; GFX89-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX89-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 14 Lines
	; GFX89-NEXT: v_mov_b32_e32 v1, 0			; GFX89-NEXT: v_mov_b32_e32 v1, 0
	; GFX89-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX89-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX89-NEXT: buffer_atomic_add_x2 v[0:1], off, s[8:11], 0 glc			; GFX89-NEXT: buffer_atomic_add_x2 v[0:1], off, s[8:11], 0 glc
	; GFX89-NEXT: s_waitcnt vmcnt(0)			; GFX89-NEXT: s_waitcnt vmcnt(0)
	; GFX89-NEXT: buffer_wbinvl1_vol			; GFX89-NEXT: buffer_wbinvl1_vol
	; GFX89-NEXT: .LBB3_2:			; GFX89-NEXT: .LBB3_2:
	; GFX89-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX89-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX89-NEXT: s_waitcnt lgkmcnt(0)			; GFX89-NEXT: s_waitcnt lgkmcnt(0)
	; GFX89-NEXT: v_readfirstlane_b32 s2, v0
	; GFX89-NEXT: v_readfirstlane_b32 s3, v1
	; GFX89-NEXT: v_mov_b32_e32 v0, s2
	; GFX89-NEXT: v_mov_b32_e32 v1, s3
	; GFX89-NEXT: v_mad_u64_u32 v[0:1], s[2:3], v2, 5, v[0:1]			; GFX89-NEXT: v_mad_u64_u32 v[0:1], s[2:3], v2, 5, v[0:1]
	; GFX89-NEXT: s_mov_b32 s3, 0xf000			; GFX89-NEXT: s_mov_b32 s3, 0xf000
	; GFX89-NEXT: s_mov_b32 s2, -1			; GFX89-NEXT: s_mov_b32 s2, -1
	; GFX89-NEXT: s_nop 2			; GFX89-NEXT: s_nop 2
	; GFX89-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX89-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX89-NEXT: s_endpgm			; GFX89-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i64_constant:			; GFX1064-LABEL: add_i64_constant:
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_waitcnt vmcnt(0)			; GFX7LESS-NEXT: s_waitcnt vmcnt(0)
	; GFX7LESS-NEXT: buffer_wbinvl1			; GFX7LESS-NEXT: buffer_wbinvl1
	; GFX7LESS-NEXT: .LBB4_2:			; GFX7LESS-NEXT: .LBB4_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s6, -1			; GFX7LESS-NEXT: s_mov_b32 s6, -1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s3, v1
	; GFX7LESS-NEXT: s_waitcnt expcnt(0)			; GFX7LESS-NEXT: s_waitcnt expcnt(0)
	; GFX7LESS-NEXT: v_mul_lo_u32 v0, s1, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v0, s1, v2
	; GFX7LESS-NEXT: v_mul_hi_u32 v1, s0, v2			; GFX7LESS-NEXT: v_mul_hi_u32 v3, s0, v2
	; GFX7LESS-NEXT: v_mul_lo_u32 v2, s0, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v2, s0, v2
	; GFX7LESS-NEXT: v_add_i32_e32 v1, vcc, v1, v0			; GFX7LESS-NEXT: v_add_i32_e32 v3, vcc, v3, v0
	; GFX7LESS-NEXT: v_mov_b32_e32 v3, s3
	; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s2, v2			; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s2, v2
	; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v3, v1, vcc			; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i64_uniform:			; GFX8-LABEL: add_i64_uniform:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX8-NEXT: s_mov_b64 s[8:9], exec			; GFX8-NEXT: s_mov_b64 s[8:9], exec
	Show All 19 Lines
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: .LBB4_2:			; GFX8-NEXT: .LBB4_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mul_lo_u32 v4, s1, v2			; GFX8-NEXT: v_mul_lo_u32 v4, s1, v2
	; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s0, v2, 0			; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s0, v2, 0
	; GFX8-NEXT: v_readfirstlane_b32 s0, v0			; GFX8-NEXT: v_readfirstlane_b32 s0, v0
	; GFX8-NEXT: v_readfirstlane_b32 s1, v1
	; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4
	; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v2
	; GFX8-NEXT: s_mov_b32 s7, 0xf000			; GFX8-NEXT: s_mov_b32 s7, 0xf000
				; GFX8-NEXT: v_add_u32_e32 v3, vcc, v3, v4
				; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v2
	; GFX8-NEXT: s_mov_b32 s6, -1			; GFX8-NEXT: s_mov_b32 s6, -1
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v3, v1, vcc			; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i64_uniform:			; GFX9-LABEL: add_i64_uniform:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: s_mov_b64 s[8:9], exec			; GFX9-NEXT: s_mov_b64 s[8:9], exec
	Show All 21 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: .LBB4_2:			; GFX9-NEXT: .LBB4_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[0:1]			; GFX9-NEXT: s_or_b64 exec, exec, s[0:1]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2			; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2
	; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0			; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0
	; GFX9-NEXT: v_readfirstlane_b32 s0, v0			; GFX9-NEXT: v_readfirstlane_b32 s0, v0
	; GFX9-NEXT: v_readfirstlane_b32 s1, v1
	; GFX9-NEXT: v_add_u32_e32 v1, v3, v4
	; GFX9-NEXT: v_mov_b32_e32 v3, s1
	; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2
	; GFX9-NEXT: s_mov_b32 s7, 0xf000			; GFX9-NEXT: s_mov_b32 s7, 0xf000
				; GFX9-NEXT: v_add_u32_e32 v3, v3, v4
				; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v3, v1, vcc			; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i64_uniform:			; GFX1064-LABEL: add_i64_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_clause 0x1			; GFX1064-NEXT: s_clause 0x1
	; GFX1064-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX1064-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX1064-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	▲ Show 20 Lines • Show All 811 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_waitcnt vmcnt(0)			; GFX7LESS-NEXT: s_waitcnt vmcnt(0)
	; GFX7LESS-NEXT: buffer_wbinvl1			; GFX7LESS-NEXT: buffer_wbinvl1
	; GFX7LESS-NEXT: .LBB9_2:			; GFX7LESS-NEXT: .LBB9_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s5, v1			; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX7LESS-NEXT: s_waitcnt expcnt(0)			; GFX7LESS-NEXT: s_waitcnt expcnt(0)
	; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2
	; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX7LESS-NEXT: v_mov_b32_e32 v2, s5
	; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s4, v0			; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s4, v0
	; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc			; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: sub_i64_constant:			; GFX8-LABEL: sub_i64_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[6:7], exec			; GFX8-NEXT: s_mov_b64 s[6:7], exec
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 14 Lines
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_atomic_sub_x2 v[0:1], off, s[8:11], 0 glc			; GFX8-NEXT: buffer_atomic_sub_x2 v[0:1], off, s[8:11], 0 glc
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: .LBB9_2:			; GFX8-NEXT: .LBB9_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: v_readfirstlane_b32 s4, v0			; GFX8-NEXT: v_readfirstlane_b32 s4, v0
	; GFX8-NEXT: v_readfirstlane_b32 s5, v1
	; GFX8-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX8-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX8-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2			; GFX8-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX8-NEXT: v_mov_b32_e32 v2, s5
	; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s4, v0			; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s4, v0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc			; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: sub_i64_constant:			; GFX9-LABEL: sub_i64_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[6:7], exec			; GFX9-NEXT: s_mov_b64 s[6:7], exec
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 14 Lines
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_atomic_sub_x2 v[0:1], off, s[8:11], 0 glc			; GFX9-NEXT: buffer_atomic_sub_x2 v[0:1], off, s[8:11], 0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: .LBB9_2:			; GFX9-NEXT: .LBB9_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: v_readfirstlane_b32 s4, v0			; GFX9-NEXT: v_readfirstlane_b32 s4, v0
	; GFX9-NEXT: v_readfirstlane_b32 s5, v1
	; GFX9-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX9-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX9-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2			; GFX9-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX9-NEXT: v_mov_b32_e32 v2, s5
	; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s4, v0			; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s4, v0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v2, v1, vcc			; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i64_constant:			; GFX1064-LABEL: sub_i64_constant:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX1064-NEXT: s_mov_b64 s[6:7], exec			; GFX1064-NEXT: s_mov_b64 s[6:7], exec
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_waitcnt vmcnt(0)			; GFX7LESS-NEXT: s_waitcnt vmcnt(0)
	; GFX7LESS-NEXT: buffer_wbinvl1			; GFX7LESS-NEXT: buffer_wbinvl1
	; GFX7LESS-NEXT: .LBB10_2:			; GFX7LESS-NEXT: .LBB10_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s6, -1			; GFX7LESS-NEXT: s_mov_b32 s6, -1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s3, v1
	; GFX7LESS-NEXT: s_waitcnt expcnt(0)			; GFX7LESS-NEXT: s_waitcnt expcnt(0)
	; GFX7LESS-NEXT: v_mul_lo_u32 v0, s1, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v0, s1, v2
	; GFX7LESS-NEXT: v_mul_hi_u32 v1, s0, v2			; GFX7LESS-NEXT: v_mul_hi_u32 v3, s0, v2
	; GFX7LESS-NEXT: v_mul_lo_u32 v2, s0, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v2, s0, v2
	; GFX7LESS-NEXT: v_add_i32_e32 v1, vcc, v1, v0			; GFX7LESS-NEXT: v_add_i32_e32 v3, vcc, v3, v0
	; GFX7LESS-NEXT: v_mov_b32_e32 v3, s3
	; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s2, v2			; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s2, v2
	; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v3, v1, vcc			; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: sub_i64_uniform:			; GFX8-LABEL: sub_i64_uniform:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX8-NEXT: s_mov_b64 s[8:9], exec			; GFX8-NEXT: s_mov_b64 s[8:9], exec
	Show All 19 Lines
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: .LBB10_2:			; GFX8-NEXT: .LBB10_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mul_lo_u32 v4, s1, v2			; GFX8-NEXT: v_mul_lo_u32 v4, s1, v2
	; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s0, v2, 0			; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s0, v2, 0
	; GFX8-NEXT: v_readfirstlane_b32 s0, v0			; GFX8-NEXT: v_readfirstlane_b32 s0, v0
	; GFX8-NEXT: v_readfirstlane_b32 s1, v1
	; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4
	; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v2
	; GFX8-NEXT: s_mov_b32 s7, 0xf000			; GFX8-NEXT: s_mov_b32 s7, 0xf000
				; GFX8-NEXT: v_add_u32_e32 v3, vcc, v3, v4
				; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v2
	; GFX8-NEXT: s_mov_b32 s6, -1			; GFX8-NEXT: s_mov_b32 s6, -1
	; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v3, v1, vcc			; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: sub_i64_uniform:			; GFX9-LABEL: sub_i64_uniform:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; GFX9-NEXT: s_mov_b64 s[8:9], exec			; GFX9-NEXT: s_mov_b64 s[8:9], exec
	Show All 21 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: .LBB10_2:			; GFX9-NEXT: .LBB10_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[0:1]			; GFX9-NEXT: s_or_b64 exec, exec, s[0:1]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2			; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2
	; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0			; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0
	; GFX9-NEXT: v_readfirstlane_b32 s0, v0			; GFX9-NEXT: v_readfirstlane_b32 s0, v0
	; GFX9-NEXT: v_readfirstlane_b32 s1, v1
	; GFX9-NEXT: v_add_u32_e32 v1, v3, v4
	; GFX9-NEXT: v_mov_b32_e32 v3, s1
	; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v2
	; GFX9-NEXT: s_mov_b32 s7, 0xf000			; GFX9-NEXT: s_mov_b32 s7, 0xf000
				; GFX9-NEXT: v_add_u32_e32 v3, v3, v4
				; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v2
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v3, v1, vcc			; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i64_uniform:			; GFX1064-LABEL: sub_i64_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_clause 0x1			; GFX1064-NEXT: s_clause 0x1
	; GFX1064-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX1064-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; GFX1064-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

	Show All 31 Lines
	; GFX7LESS-NEXT: v_mov_b32_e32 v2, s2			; GFX7LESS-NEXT: v_mov_b32_e32 v2, s2
	; GFX7LESS-NEXT: s_mov_b32 m0, -1			; GFX7LESS-NEXT: s_mov_b32 m0, -1
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX7LESS-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: .LBB0_2:			; GFX7LESS-NEXT: .LBB0_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v1			; GFX7LESS-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: v_mad_u32_u24 v0, v0, 5, s2
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_constant:			; GFX8-LABEL: add_i32_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec			; GFX8-NEXT: s_mov_b64 s[2:3], exec
	Show All 10 Lines
	; GFX8-NEXT: v_mov_b32_e32 v2, s2			; GFX8-NEXT: v_mov_b32_e32 v2, s2
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX8-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: .LBB0_2:			; GFX8-NEXT: .LBB0_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s2, v1			; GFX8-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX8-NEXT: v_mad_u32_u24 v0, v0, 5, s2
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_constant:			; GFX9-LABEL: add_i32_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec			; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr1			; GFX9-NEXT: ; implicit-def: $vgpr1
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX9-NEXT: s_cbranch_execz .LBB0_2			; GFX9-NEXT: s_cbranch_execz .LBB0_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: s_bcnt1_i32_b64 s2, s[2:3]			; GFX9-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
	; GFX9-NEXT: s_mul_i32 s2, s2, 5			; GFX9-NEXT: s_mul_i32 s2, s2, 5
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX9-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: .LBB0_2:			; GFX9-NEXT: .LBB0_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s2, v1			; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, s2
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_constant:			; GFX1064-LABEL: add_i32_constant:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec			; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	▲ Show 20 Lines • Show All 637 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 m0, -1			; GFX7LESS-NEXT: s_mov_b32 m0, -1
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]			; GFX7LESS-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: .LBB4_2:			; GFX7LESS-NEXT: .LBB4_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v1			; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2
	; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: v_mov_b32_e32 v2, s4
	; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s2, v0			; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s2, v0
	; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc			; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i64_constant:			; GFX8-LABEL: add_i64_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[4:5], exec			; GFX8-NEXT: s_mov_b64 s[4:5], exec
	Show All 10 Lines
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]			; GFX8-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: .LBB4_2:			; GFX8-NEXT: .LBB4_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_readfirstlane_b32 s3, v1
	; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: v_mad_u64_u32 v[0:1], s[2:3], v2, 5, v[0:1]			; GFX8-NEXT: v_mad_u64_u32 v[0:1], s[2:3], v2, 5, v[0:1]
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_nop 2			; GFX8-NEXT: s_nop 2
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i64_constant:			; GFX9-LABEL: add_i64_constant:
	Show All 12 Lines
	; GFX9-NEXT: v_mov_b32_e32 v0, s4			; GFX9-NEXT: v_mov_b32_e32 v0, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]			; GFX9-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: .LBB4_2:			; GFX9-NEXT: .LBB4_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_readfirstlane_b32 s3, v1
	; GFX9-NEXT: v_mov_b32_e32 v0, s2
	; GFX9-NEXT: v_mov_b32_e32 v1, s3
	; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[2:3], v2, 5, v[0:1]			; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[2:3], v2, 5, v[0:1]
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_nop 2			; GFX9-NEXT: s_nop 2
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i64_constant:			; GFX1064-LABEL: add_i64_constant:
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: .LBB5_2:			; GFX7LESS-NEXT: .LBB5_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s6, -1			; GFX7LESS-NEXT: s_mov_b32 s6, -1
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s4, s0			; GFX7LESS-NEXT: s_mov_b32 s4, s0
	; GFX7LESS-NEXT: s_mov_b32 s5, s1			; GFX7LESS-NEXT: s_mov_b32 s5, s1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s0, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s0, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s1, v1
	; GFX7LESS-NEXT: v_mul_lo_u32 v0, s3, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v0, s3, v2
	; GFX7LESS-NEXT: v_mul_hi_u32 v1, s2, v2			; GFX7LESS-NEXT: v_mul_hi_u32 v3, s2, v2
	; GFX7LESS-NEXT: v_mul_lo_u32 v2, s2, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v2, s2, v2
	; GFX7LESS-NEXT: v_add_i32_e32 v1, vcc, v1, v0			; GFX7LESS-NEXT: v_add_i32_e32 v3, vcc, v3, v0
	; GFX7LESS-NEXT: v_mov_b32_e32 v3, s1
	; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s0, v2			; GFX7LESS-NEXT: v_add_i32_e32 v0, vcc, s0, v2
	; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v3, v1, vcc			; GFX7LESS-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i64_uniform:			; GFX8-LABEL: add_i64_uniform:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[6:7], exec			; GFX8-NEXT: s_mov_b64 s[6:7], exec
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 12 Lines
	; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1			; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]			; GFX8-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: .LBB5_2:			; GFX8-NEXT: .LBB5_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s4, s0
	; GFX8-NEXT: s_mov_b32 s5, s1
	; GFX8-NEXT: v_mul_lo_u32 v4, s3, v2			; GFX8-NEXT: v_mul_lo_u32 v4, s3, v2
	; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0			; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0
				; GFX8-NEXT: s_mov_b32 s4, s0
	; GFX8-NEXT: v_readfirstlane_b32 s0, v0			; GFX8-NEXT: v_readfirstlane_b32 s0, v0
	; GFX8-NEXT: v_readfirstlane_b32 s1, v1			; GFX8-NEXT: v_add_u32_e32 v3, vcc, v3, v4
	; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4
	; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, s0, v2
	; GFX8-NEXT: s_mov_b32 s7, 0xf000			; GFX8-NEXT: s_mov_b32 s7, 0xf000
	; GFX8-NEXT: s_mov_b32 s6, -1			; GFX8-NEXT: s_mov_b32 s6, -1
	; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v3, v1, vcc			; GFX8-NEXT: s_mov_b32 s5, s1
				; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i64_uniform:			; GFX9-LABEL: add_i64_uniform:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[6:7], exec			; GFX9-NEXT: s_mov_b64 s[6:7], exec
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 16 Lines
	; GFX9-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]			; GFX9-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: .LBB5_2:			; GFX9-NEXT: .LBB5_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2			; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2
	; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0			; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0
	; GFX9-NEXT: s_mov_b32 s4, s0			; GFX9-NEXT: s_mov_b32 s4, s0
	; GFX9-NEXT: s_mov_b32 s5, s1
	; GFX9-NEXT: v_readfirstlane_b32 s0, v0			; GFX9-NEXT: v_readfirstlane_b32 s0, v0
	; GFX9-NEXT: v_readfirstlane_b32 s1, v1			; GFX9-NEXT: v_add_u32_e32 v3, v3, v4
	; GFX9-NEXT: v_add_u32_e32 v1, v3, v4
	; GFX9-NEXT: v_mov_b32_e32 v3, s1
	; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2			; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2
	; GFX9-NEXT: s_mov_b32 s7, 0xf000			; GFX9-NEXT: s_mov_b32 s7, 0xf000
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v3, v1, vcc			; GFX9-NEXT: s_mov_b32 s5, s1
				; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i64_uniform:			; GFX1064-LABEL: add_i64_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX1064-NEXT: s_mov_b64 s[6:7], exec			; GFX1064-NEXT: s_mov_b64 s[6:7], exec
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	▲ Show 20 Lines • Show All 881 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 m0, -1			; GFX7LESS-NEXT: s_mov_b32 m0, -1
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]			; GFX7LESS-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: .LBB11_2:			; GFX7LESS-NEXT: .LBB11_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s2, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s4, v1			; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX7LESS-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2
	; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX7LESS-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: v_mov_b32_e32 v2, s4
	; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s2, v0			; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s2, v0
	; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc			; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: sub_i64_constant:			; GFX8-LABEL: sub_i64_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[4:5], exec			; GFX8-NEXT: s_mov_b64 s[4:5], exec
	Show All 11 Lines
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]			; GFX8-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: .LBB11_2:			; GFX8-NEXT: .LBB11_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_readfirstlane_b32 s3, v1
	; GFX8-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX8-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX8-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2			; GFX8-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX8-NEXT: v_mov_b32_e32 v2, s3
	; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v0			; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v0
	; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v2, v1, vcc			; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: sub_i64_constant:			; GFX9-LABEL: sub_i64_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	Show All 11 Lines
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]			; GFX9-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: .LBB11_2:			; GFX9-NEXT: .LBB11_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_readfirstlane_b32 s3, v1
	; GFX9-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX9-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX9-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2			; GFX9-NEXT: v_mul_hi_u32_u24_e32 v3, 5, v2
	; GFX9-NEXT: v_mov_b32_e32 v2, s3
	; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s2, v0			; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s2, v0
	; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v2, v1, vcc			; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i64_constant:			; GFX1064-LABEL: sub_i64_constant:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: .LBB12_2:			; GFX7LESS-NEXT: .LBB12_2:
	; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX7LESS-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s7, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s6, -1			; GFX7LESS-NEXT: s_mov_b32 s6, -1
	; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)			; GFX7LESS-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7LESS-NEXT: s_mov_b32 s4, s0			; GFX7LESS-NEXT: s_mov_b32 s4, s0
	; GFX7LESS-NEXT: s_mov_b32 s5, s1			; GFX7LESS-NEXT: s_mov_b32 s5, s1
	; GFX7LESS-NEXT: v_readfirstlane_b32 s0, v0			; GFX7LESS-NEXT: v_readfirstlane_b32 s0, v0
	; GFX7LESS-NEXT: v_readfirstlane_b32 s1, v1
	; GFX7LESS-NEXT: v_mul_lo_u32 v0, s3, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v0, s3, v2
	; GFX7LESS-NEXT: v_mul_hi_u32 v1, s2, v2			; GFX7LESS-NEXT: v_mul_hi_u32 v3, s2, v2
	; GFX7LESS-NEXT: v_mul_lo_u32 v2, s2, v2			; GFX7LESS-NEXT: v_mul_lo_u32 v2, s2, v2
	; GFX7LESS-NEXT: v_add_i32_e32 v1, vcc, v1, v0			; GFX7LESS-NEXT: v_add_i32_e32 v3, vcc, v3, v0
	; GFX7LESS-NEXT: v_mov_b32_e32 v3, s1
	; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s0, v2			; GFX7LESS-NEXT: v_sub_i32_e32 v0, vcc, s0, v2
	; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v3, v1, vcc			; GFX7LESS-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX7LESS-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: sub_i64_uniform:			; GFX8-LABEL: sub_i64_uniform:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[6:7], exec			; GFX8-NEXT: s_mov_b64 s[6:7], exec
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 12 Lines
	; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1			; GFX8-NEXT: v_add_u32_e32 v1, vcc, s6, v1
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1]			; GFX8-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: .LBB12_2:			; GFX8-NEXT: .LBB12_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s4, s0
	; GFX8-NEXT: s_mov_b32 s5, s1
	; GFX8-NEXT: v_mul_lo_u32 v4, s3, v2			; GFX8-NEXT: v_mul_lo_u32 v4, s3, v2
	; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[0:1], s2, v2, 0			; GFX8-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0
				; GFX8-NEXT: s_mov_b32 s4, s0
	; GFX8-NEXT: v_readfirstlane_b32 s0, v0			; GFX8-NEXT: v_readfirstlane_b32 s0, v0
	; GFX8-NEXT: v_readfirstlane_b32 s1, v1			; GFX8-NEXT: v_add_u32_e32 v3, vcc, v3, v4
	; GFX8-NEXT: v_add_u32_e32 v1, vcc, v3, v4
	; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v2			; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s0, v2
	; GFX8-NEXT: s_mov_b32 s7, 0xf000			; GFX8-NEXT: s_mov_b32 s7, 0xf000
	; GFX8-NEXT: s_mov_b32 s6, -1			; GFX8-NEXT: s_mov_b32 s6, -1
	; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v3, v1, vcc			; GFX8-NEXT: s_mov_b32 s5, s1
				; GFX8-NEXT: v_subb_u32_e32 v1, vcc, v1, v3, vcc
	; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX8-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: sub_i64_uniform:			; GFX9-LABEL: sub_i64_uniform:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[6:7], exec			; GFX9-NEXT: s_mov_b64 s[6:7], exec
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	Show All 16 Lines
	; GFX9-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1]			; GFX9-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: .LBB12_2:			; GFX9-NEXT: .LBB12_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2			; GFX9-NEXT: v_mul_lo_u32 v4, s3, v2
	; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0			; GFX9-NEXT: v_mad_u64_u32 v[2:3], s[2:3], s2, v2, 0
	; GFX9-NEXT: s_mov_b32 s4, s0			; GFX9-NEXT: s_mov_b32 s4, s0
	; GFX9-NEXT: s_mov_b32 s5, s1
	; GFX9-NEXT: v_readfirstlane_b32 s0, v0			; GFX9-NEXT: v_readfirstlane_b32 s0, v0
	; GFX9-NEXT: v_readfirstlane_b32 s1, v1			; GFX9-NEXT: v_add_u32_e32 v3, v3, v4
	; GFX9-NEXT: v_add_u32_e32 v1, v3, v4
	; GFX9-NEXT: v_mov_b32_e32 v3, s1
	; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v2			; GFX9-NEXT: v_sub_co_u32_e32 v0, vcc, s0, v2
	; GFX9-NEXT: s_mov_b32 s7, 0xf000			; GFX9-NEXT: s_mov_b32 s7, 0xf000
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v3, v1, vcc			; GFX9-NEXT: s_mov_b32 s5, s1
				; GFX9-NEXT: v_subb_co_u32_e32 v1, vcc, v1, v3, vcc
	; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0			; GFX9-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i64_uniform:			; GFX1064-LABEL: sub_i64_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX1064-NEXT: s_mov_b64 s[6:7], exec			; GFX1064-NEXT: s_mov_b64 s[6:7], exec
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
	▲ Show 20 Lines • Show All 2,498 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

	Show All 28 Lines
	; GFX7-NEXT: ; %bb.2:			; GFX7-NEXT: ; %bb.2:
	; GFX7-NEXT: s_bcnt1_i32_b64 s12, s[12:13]			; GFX7-NEXT: s_bcnt1_i32_b64 s12, s[12:13]
	; GFX7-NEXT: s_mul_i32 s12, s12, 5			; GFX7-NEXT: s_mul_i32 s12, s12, 5
	; GFX7-NEXT: v_mov_b32_e32 v1, s12			; GFX7-NEXT: v_mov_b32_e32 v1, s12
	; GFX7-NEXT: buffer_atomic_add v1, off, s[4:7], 0 glc			; GFX7-NEXT: buffer_atomic_add v1, off, s[4:7], 0 glc
	; GFX7-NEXT: .LBB0_3:			; GFX7-NEXT: .LBB0_3:
	; GFX7-NEXT: s_or_b64 exec, exec, s[10:11]			; GFX7-NEXT: s_or_b64 exec, exec, s[10:11]
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readfirstlane_b32 s4, v1			; GFX7-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX7-NEXT: v_mad_u32_u24 v0, v0, 5, s4
	; GFX7-NEXT: .LBB0_4: ; %Flow			; GFX7-NEXT: .LBB0_4: ; %Flow
	; GFX7-NEXT: s_or_b64 exec, exec, s[8:9]			; GFX7-NEXT: s_or_b64 exec, exec, s[8:9]
	; GFX7-NEXT: s_wqm_b64 s[4:5], -1			; GFX7-NEXT: s_wqm_b64 s[4:5], -1
	; GFX7-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GFX7-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GFX7-NEXT: s_cbranch_vccnz .LBB0_6			; GFX7-NEXT: s_cbranch_vccnz .LBB0_6
	; GFX7-NEXT: ; %bb.5: ; %if			; GFX7-NEXT: ; %bb.5: ; %if
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: .LBB0_6: ; %UnifiedReturnBlock			; GFX7-NEXT: .LBB0_6: ; %UnifiedReturnBlock
	Show All 16 Lines
	; GFX89-NEXT: ; %bb.2:			; GFX89-NEXT: ; %bb.2:
	; GFX89-NEXT: s_bcnt1_i32_b64 s12, s[12:13]			; GFX89-NEXT: s_bcnt1_i32_b64 s12, s[12:13]
	; GFX89-NEXT: s_mul_i32 s12, s12, 5			; GFX89-NEXT: s_mul_i32 s12, s12, 5
	; GFX89-NEXT: v_mov_b32_e32 v1, s12			; GFX89-NEXT: v_mov_b32_e32 v1, s12
	; GFX89-NEXT: buffer_atomic_add v1, off, s[4:7], 0 glc			; GFX89-NEXT: buffer_atomic_add v1, off, s[4:7], 0 glc
	; GFX89-NEXT: .LBB0_3:			; GFX89-NEXT: .LBB0_3:
	; GFX89-NEXT: s_or_b64 exec, exec, s[10:11]			; GFX89-NEXT: s_or_b64 exec, exec, s[10:11]
	; GFX89-NEXT: s_waitcnt vmcnt(0)			; GFX89-NEXT: s_waitcnt vmcnt(0)
	; GFX89-NEXT: v_readfirstlane_b32 s4, v1			; GFX89-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX89-NEXT: v_mad_u32_u24 v0, v0, 5, s4
	; GFX89-NEXT: .LBB0_4: ; %Flow			; GFX89-NEXT: .LBB0_4: ; %Flow
	; GFX89-NEXT: s_or_b64 exec, exec, s[8:9]			; GFX89-NEXT: s_or_b64 exec, exec, s[8:9]
	; GFX89-NEXT: s_wqm_b64 s[4:5], -1			; GFX89-NEXT: s_wqm_b64 s[4:5], -1
	; GFX89-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GFX89-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GFX89-NEXT: s_cbranch_vccnz .LBB0_6			; GFX89-NEXT: s_cbranch_vccnz .LBB0_6
	; GFX89-NEXT: ; %bb.5: ; %if			; GFX89-NEXT: ; %bb.5: ; %if
	; GFX89-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX89-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX89-NEXT: .LBB0_6: ; %UnifiedReturnBlock			; GFX89-NEXT: .LBB0_6: ; %UnifiedReturnBlock
	▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll

	Show All 26 Lines
	; GFX6-NEXT: s_bcnt1_i32_b64 s0, s[2:3]			; GFX6-NEXT: s_bcnt1_i32_b64 s0, s[2:3]
	; GFX6-NEXT: s_mul_i32 s0, s0, 5			; GFX6-NEXT: s_mul_i32 s0, s0, 5
	; GFX6-NEXT: v_mov_b32_e32 v1, s0			; GFX6-NEXT: v_mov_b32_e32 v1, s0
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX6-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX6-NEXT: .LBB0_2:			; GFX6-NEXT: .LBB0_2:
	; GFX6-NEXT: s_or_b64 exec, exec, s[6:7]			; GFX6-NEXT: s_or_b64 exec, exec, s[6:7]
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readfirstlane_b32 s0, v1			; GFX6-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX6-NEXT: s_mov_b32 s7, 0xf000			; GFX6-NEXT: s_mov_b32 s7, 0xf000
	; GFX6-NEXT: v_mad_u32_u24 v0, v0, 5, s0
	; GFX6-NEXT: s_mov_b32 s6, -1			; GFX6-NEXT: s_mov_b32 s6, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_constant:			; GFX8-LABEL: add_i32_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	Show All 9 Lines
	; GFX8-NEXT: s_bcnt1_i32_b64 s0, s[6:7]			; GFX8-NEXT: s_bcnt1_i32_b64 s0, s[6:7]
	; GFX8-NEXT: s_mul_i32 s0, s0, 5			; GFX8-NEXT: s_mul_i32 s0, s0, 5
	; GFX8-NEXT: v_mov_b32_e32 v1, s0			; GFX8-NEXT: v_mov_b32_e32 v1, s0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX8-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX8-NEXT: .LBB0_2:			; GFX8-NEXT: .LBB0_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s0, v1			; GFX8-NEXT: v_mad_u32_u24 v2, v0, 5, v1
	; GFX8-NEXT: v_mad_u32_u24 v2, v0, 5, s0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_constant:			; GFX9-LABEL: add_i32_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	Show All 10 Lines
	; GFX9-NEXT: s_bcnt1_i32_b64 s0, s[6:7]			; GFX9-NEXT: s_bcnt1_i32_b64 s0, s[6:7]
	; GFX9-NEXT: s_mul_i32 s0, s0, 5			; GFX9-NEXT: s_mul_i32 s0, s0, 5
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc			; GFX9-NEXT: buffer_atomic_add v1, off, s[8:11], 0 glc
	; GFX9-NEXT: .LBB0_2:			; GFX9-NEXT: .LBB0_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s0, v1			; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_store_dword v1, v0, s[2:3]			; GFX9-NEXT: global_store_dword v1, v0, s[2:3]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10W64-LABEL: add_i32_constant:			; GFX10W64-LABEL: add_i32_constant:
	; GFX10W64: ; %bb.0: ; %entry			; GFX10W64: ; %bb.0: ; %entry
	; GFX10W64-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX10W64-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	▲ Show 20 Lines • Show All 1,073 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll

	Show All 27 Lines
	; GFX6-NEXT: s_mul_i32 s0, s0, 5			; GFX6-NEXT: s_mul_i32 s0, s0, 5
	; GFX6-NEXT: v_mov_b32_e32 v1, s0			; GFX6-NEXT: v_mov_b32_e32 v1, s0
	; GFX6-NEXT: v_mov_b32_e32 v2, 0			; GFX6-NEXT: v_mov_b32_e32 v2, 0
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: buffer_atomic_add v1, v2, s[8:11], 0 idxen glc			; GFX6-NEXT: buffer_atomic_add v1, v2, s[8:11], 0 idxen glc
	; GFX6-NEXT: .LBB0_2:			; GFX6-NEXT: .LBB0_2:
	; GFX6-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX6-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: v_readfirstlane_b32 s0, v1			; GFX6-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX6-NEXT: s_mov_b32 s7, 0xf000			; GFX6-NEXT: s_mov_b32 s7, 0xf000
	; GFX6-NEXT: v_mad_u32_u24 v0, v0, 5, s0
	; GFX6-NEXT: s_mov_b32 s6, -1			; GFX6-NEXT: s_mov_b32 s6, -1
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX6-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_constant:			; GFX8-LABEL: add_i32_constant:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	Show All 10 Lines
	; GFX8-NEXT: s_mul_i32 s0, s0, 5			; GFX8-NEXT: s_mul_i32 s0, s0, 5
	; GFX8-NEXT: v_mov_b32_e32 v1, s0			; GFX8-NEXT: v_mov_b32_e32 v1, s0
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_atomic_add v1, v2, s[8:11], 0 idxen glc			; GFX8-NEXT: buffer_atomic_add v1, v2, s[8:11], 0 idxen glc
	; GFX8-NEXT: .LBB0_2:			; GFX8-NEXT: .LBB0_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s0, v1			; GFX8-NEXT: v_mad_u32_u24 v2, v0, 5, v1
	; GFX8-NEXT: v_mad_u32_u24 v2, v0, 5, s0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_constant:			; GFX9-LABEL: add_i32_constant:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	Show All 11 Lines
	; GFX9-NEXT: s_mul_i32 s0, s0, 5			; GFX9-NEXT: s_mul_i32 s0, s0, 5
	; GFX9-NEXT: v_mov_b32_e32 v1, s0			; GFX9-NEXT: v_mov_b32_e32 v1, s0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_atomic_add v1, v2, s[8:11], 0 idxen glc			; GFX9-NEXT: buffer_atomic_add v1, v2, s[8:11], 0 idxen glc
	; GFX9-NEXT: .LBB0_2:			; GFX9-NEXT: .LBB0_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s0, v1			; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, v1
	; GFX9-NEXT: v_mad_u32_u24 v0, v0, 5, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_store_dword v1, v0, s[2:3]			; GFX9-NEXT: global_store_dword v1, v0, s[2:3]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10W64-LABEL: add_i32_constant:			; GFX10W64-LABEL: add_i32_constant:
	; GFX10W64: ; %bb.0: ; %entry			; GFX10W64: ; %bb.0: ; %entry
	; GFX10W64-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; GFX10W64-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	▲ Show 20 Lines • Show All 1,240 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fold-readlane.mir

# RUN: llc -march=amdgcn -run-pass si-fold-operands -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=gfx908 -run-pass si-fold-operands -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

# GCN-LABEL: name: fold-imm-readfirstlane{{$}}		# GCN-LABEL: name: fold-imm-readfirstlane{{$}}
# GCN: %1:sreg_32_xm0 = S_MOV_B32 123		# GCN: %1:sreg_32_xm0 = S_MOV_B32 123
---		---
name: fold-imm-readfirstlane		name: fold-imm-readfirstlane
tracksRegLiveness: true		tracksRegLiveness: true
body: \|		body: \|
bb.0:		bb.0:
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	bb.0:
%0:sreg_32_xm0 = COPY $sgpr10		%0:sreg_32_xm0 = COPY $sgpr10
%1:sreg_32_xm0 = COPY $sgpr11		%1:sreg_32_xm0 = COPY $sgpr11
%2:vgpr_32 = COPY %0		%2:vgpr_32 = COPY %0
%3:vgpr_32 = COPY %1		%3:vgpr_32 = COPY %1
%4:vreg_64 = REG_SEQUENCE %2:vgpr_32, %subreg.sub0, killed %3:vgpr_32, %subreg.sub1		%4:vreg_64 = REG_SEQUENCE %2:vgpr_32, %subreg.sub0, killed %3:vgpr_32, %subreg.sub1
%5:sgpr_32 = V_READFIRSTLANE_B32 %4.sub0:vreg_64, implicit $exec		%5:sgpr_32 = V_READFIRSTLANE_B32 %4.sub0:vreg_64, implicit $exec
%6:sgpr_32 = V_READFIRSTLANE_B32 %4.sub1:vreg_64, implicit $exec		%6:sgpr_32 = V_READFIRSTLANE_B32 %4.sub1:vreg_64, implicit $exec
...		...

		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_virtreg{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: %2:vgpr_32 = COPY %0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit %2
		---
		name: fold_readfirstlane_into_copy_to_vgpr_virtreg
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		%2:vgpr_32 = COPY %1
		S_NOP 0, implicit %2
		...

		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_virtreg_kill{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: S_NOP 0, implicit %0
		# GCN-NEXT: %2:vgpr_32 = COPY %0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit %2
		---
		name: fold_readfirstlane_into_copy_to_vgpr_virtreg_kill
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		S_NOP 0, implicit killed %0
		%2:vgpr_32 = COPY killed %1
		S_NOP 0, implicit %2
		...

		# Make sure we don't delete def for the other user.
		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_virtreg_multi_use{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: %1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit %1
		# GCN-NEXT: %2:vgpr_32 = COPY killed %1
		# GCN-NEXT: S_NOP 0, implicit %2
		---
		name: fold_readfirstlane_into_copy_to_vgpr_virtreg_multi_use
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		S_NOP 0, implicit %1
		%2:vgpr_32 = COPY killed %1
		S_NOP 0, implicit %2
		...

		# GCN-LABEL: name: copy_undef_virtreg_sgpr{{$}}
		# GCN: %0:vgpr_32 = COPY undef %1:sgpr_32
		---
		name: copy_undef_virtreg_sgpr
		tracksRegLiveness: true
		body: \|
		bb.0:
		%0:vgpr_32 = COPY undef %1:sgpr_32
		S_NOP 0, implicit %0
		...

		# GCN-LABEL: name: fold_readfirstlane_physreg_into_copy_to_vgpr_virtreg{{$}}
		# GCN: %1:vgpr_32 = COPY $vgpr0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit %1
		---
		name: fold_readfirstlane_physreg_into_copy_to_vgpr_virtreg
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 $vgpr0, implicit $exec
		%2:vgpr_32 = COPY %1
		S_NOP 0, implicit %2
		...

		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_physreg{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: $vgpr0 = COPY %0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit $vgpr0
		---
		name: fold_readfirstlane_into_copy_to_vgpr_physreg
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		$vgpr0 = COPY %1
		S_NOP 0, implicit $vgpr0
		...

		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_virtreg_execdef{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: %1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit-def $exec
		# GCN-NEXT: %2:vgpr_32 = COPY %1
		---
		name: fold_readfirstlane_into_copy_to_vgpr_virtreg_execdef
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		S_NOP 0, implicit-def $exec
		%2:vgpr_32 = COPY %1
		S_NOP 0, implicit %2
		...

		# GCN-LABEL: name: no_fold_readfirstlane_into_copy_to_sgpr_virtreg{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: %1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		# GCN-NEXT: %2:sgpr_32 = COPY %1
		---
		name: no_fold_readfirstlane_into_copy_to_sgpr_virtreg
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		%2:sgpr_32 = COPY %1
		S_NOP 0, implicit %2
		...

		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_agpr_virtreg{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: %2:agpr_32 = COPY %0, implicit $exec
		# GCN-NEXT: S_NOP 0, implicit %2
		---
		name: fold_readfirstlane_into_copy_to_agpr_virtreg
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0
		%0:vgpr_32 = COPY $vgpr0
		%1:sreg_32_xm0 = V_READFIRSTLANE_B32 %0, implicit $exec
		%2:agpr_32 = COPY %1
		S_NOP 0, implicit %2
		...

		# TODO: Should be able to handle this
		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_virtreg_reg_sequence{{$}}
		# GCN: %0:vgpr_32 = COPY $vgpr0
		# GCN-NEXT: %1:vgpr_32 = COPY $vgpr1
		# GCN-NEXT: %2:sreg_32_xm0 = V_READFIRSTLANE_B32 killed %0, implicit $exec
		# GCN-NEXT: %3:sreg_32_xm0 = V_READFIRSTLANE_B32 killed %1, implicit $exec
		# GCN-NEXT: %4:vreg_64 = REG_SEQUENCE %2, %subreg.sub0, %3, %subreg.sub1
		# GCN-NEXT: %5:vreg_64 = COPY %4
		# GCN-NEXT: S_NOP 0, implicit %5

		---
		name: fold_readfirstlane_into_copy_to_vgpr_virtreg_reg_sequence
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0, $vgpr1
		%0:vgpr_32 = COPY $vgpr0
		%1:vgpr_32 = COPY $vgpr1
		%2:sreg_32_xm0 = V_READFIRSTLANE_B32 killed %0, implicit $exec
		%3:sreg_32_xm0 = V_READFIRSTLANE_B32 killed %1, implicit $exec
		%4:vreg_64 = REG_SEQUENCE %2, %subreg.sub0, %3, %subreg.sub1
		%5:vreg_64 = COPY %4
		S_NOP 0, implicit %5
		...

		# GCN-LABEL: name: fold_readfirstlane_into_copy_to_vgpr_virtreg_reg_sequence_extract{{$}}
		# GCN: %0:vreg_64 = COPY $vgpr0_vgpr1
		# GCN-NEXT: %3:sreg_32_xm0 = V_READFIRSTLANE_B32 %0.sub0, implicit $exec
		# GCN-NEXT: %4:sreg_32_xm0 = V_READFIRSTLANE_B32 %0.sub1, implicit $exec
		# GCN-NEXT: %5:sreg_64 = REG_SEQUENCE %3, %subreg.sub0, %4, %subreg.sub1
		# GCN-NEXT: %6:vreg_64 = COPY %5
		# GCN-NEXT: S_NOP 0, implicit %6
		---
		name: fold_readfirstlane_into_copy_to_vgpr_virtreg_reg_sequence_extract
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1
		%0:vreg_64 = COPY $vgpr0_vgpr1
		%1:vgpr_32 = COPY %0.sub0
		%2:vgpr_32 = COPY %0.sub1
		%3:sreg_32_xm0 = V_READFIRSTLANE_B32 killed %1, implicit $exec
		%4:sreg_32_xm0 = V_READFIRSTLANE_B32 killed %2, implicit $exec
		%5:sreg_64 = REG_SEQUENCE %3, %subreg.sub0, %4, %subreg.sub1
		%6:vreg_64 = COPY %5
		S_NOP 0, implicit %6
		...

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s

	declare i32 @llvm.amdgcn.readfirstlane(i32) #0			declare i32 @llvm.amdgcn.readfirstlane(i32) #0

	; CHECK-LABEL: {{^}}test_readfirstlane:			; CHECK-LABEL: {{^}}test_readfirstlane:
	; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2			; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
	define void @test_readfirstlane(i32 addrspace(1)* %out, i32 %src) #1 {			define void @test_readfirstlane(i32 addrspace(1)* %out, i32 %src) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %src)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %src)
				call void asm sideeffect "; use $0","s"(i32 %readfirstlane)
				ret void
				}

				; The readfirstlane is copied right back to a VGPR, so this is
				; eliminated.
				; CHECK-LABEL: {{^}}test_readfirstlane_copyback_v:
				; CHECK: s_waitcnt
				; CHECK-NEXT: flat_store_dword
				; CHECK-NEXT: s_waitcnt
				; CHECK-NEXT: s_setpc_b64
				define void @test_readfirstlane_copyback_v(i32 addrspace(1)* %out, i32 %src) #1 {
				%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %src)
	store i32 %readfirstlane, i32 addrspace(1)* %out, align 4			store i32 %readfirstlane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_imm:			; CHECK-LABEL: {{^}}test_readfirstlane_imm:
	; CHECK: s_mov_b32 [[SGPR_VAL:s[0-9]]], 32			; CHECK: s_mov_b32 [[SGPR_VAL:s[0-9]]], 32
	; CHECK-NOT: [[SGPR_VAL]]			; CHECK-NOT: [[SGPR_VAL]]
	; CHECK: ; use [[SGPR_VAL]]			; CHECK: ; use [[SGPR_VAL]]
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wave-id-computation.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_and_b32 s4, s4, 0xffff			; CHECK-NEXT: s_and_b32 s4, s4, 0xffff
	; CHECK-NEXT: s_mul_i32 s12, s12, s4			; CHECK-NEXT: s_mul_i32 s12, s12, s4
	; CHECK-NEXT: v_add_u32_e32 v0, s12, v0			; CHECK-NEXT: v_add_u32_e32 v0, s12, v0
	; CHECK-NEXT: v_ashrrev_i32_e32 v1, 31, v0			; CHECK-NEXT: v_ashrrev_i32_e32 v1, 31, v0
	; CHECK-NEXT: v_lshrrev_b32_e32 v1, 26, v1			; CHECK-NEXT: v_lshrrev_b32_e32 v1, 26, v1
	; CHECK-NEXT: v_add_u32_e32 v0, v0, v1			; CHECK-NEXT: v_add_u32_e32 v0, v0, v1
	; CHECK-NEXT: v_ashrrev_i32_e32 v0, 6, v0			; CHECK-NEXT: v_ashrrev_i32_e32 v0, 6, v0
	; CHECK-NEXT: v_readfirstlane_b32 s4, v0			; CHECK-NEXT: ; kill: def $vgpr0 killed $vgpr0 killed $exec
	; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%i = tail call i32 @llvm.amdgcn.workgroup.id.x()			%i = tail call i32 @llvm.amdgcn.workgroup.id.x()
	%i1 = tail call align 4 dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()			%i1 = tail call align 4 dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
	%i2 = getelementptr i8, i8 addrspace(4)* %i1, i64 4			%i2 = getelementptr i8, i8 addrspace(4)* %i1, i64 4
	%i3 = bitcast i8 addrspace(4)* %i2 to i16 addrspace(4)*			%i3 = bitcast i8 addrspace(4)* %i2 to i16 addrspace(4)*
	%i4 = load i16, i16 addrspace(4)* %i3, align 4, !range !0, !invariant.load !1			%i4 = load i16, i16 addrspace(4)* %i3, align 4, !range !0, !invariant.load !1
	%i8 = zext i16 %i4 to i32			%i8 = zext i16 %i4 to i32
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fold out readfirstlane between vgpr to vgpr copiesNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 424894

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_optimizations_mul_one.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll

llvm/test/CodeGen/AMDGPU/fold-readlane.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll

llvm/test/CodeGen/AMDGPU/wave-id-computation.ll

AMDGPU: Fold out readfirstlane between vgpr to vgpr copies
Needs ReviewPublic