This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Select llvm.amdgcn.ballot
ClosedPublic

Authored by mbrkusanin on Jul 6 2020, 4:52 AM.

Download Raw Diff

Details

Reviewers

foad
arsenm

Commits

rGce23e54162ed: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot

Summary

Select ballot intrinsic for GlobalISel.

Diff Detail

Event Timeline

mbrkusanin created this revision.Jul 6 2020, 4:52 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 9 others. · View Herald TranscriptJul 6 2020, 4:52 AM

Harbormaster failed remote builds in B63004: Diff 275666!Jul 6 2020, 6:16 AM

arsenm requested changes to this revision.Jul 6 2020, 6:34 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1054–1059	You want getConstantVRegVal instead of looking through a copy and checking for G_CONSTANT
1062	This doesn't make any sense; there's no reason to ever use the VOP3 encoded form of v_mov_b32. It's nota 64-bit move. This also returns a scalar value
1065	Register
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4167	This returns an SGPR value

This revision now requires changes to proceed.Jul 6 2020, 6:34 AM

arsenm added inline comments.Jul 6 2020, 6:34 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
9	Can you give the test names more desrciptive names, like constant_false, constant_true? Also the function returns should use SGPRs, so switch to shader calling conventions?

Addressed comments
Also renamed and updated tests for SDag. Let me know if you would rather have this as a separate patch.

arsenm added inline comments.Jul 7 2020, 6:04 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1053–1054	I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through
1059	This would need to be an S_MOV_B64 for wave 64?
1064	This should be unreachable code (however, the verifier doesn't check intrinsic operand types so I guess you have to leave this)
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
12–13	This can be one s_mov_b64
24–25	One s_mov_b64

Also renamed and updated SDag tests.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1053–1054	Unfortunately regular version fails to produce the value.
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
12–13	It can, but SIFoldOperands will not let that happen. From: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = COPY %10.sub0:sreg_64 %4:sreg_32 = COPY %10.sub1:sreg_64 plus some instructions that use %3, %4 but will eventually be removed. SIFoldOperands will produce: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = S_MOV_B32 0 %4:sreg_32 = S_MOV_B32 0 ... which makes the first instruction dead and in the end we're left with two S_MOV_B32. For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands. Alternatively we can produce v_cmp_ne_u32_e64 s[0:1], 0, 0 if for whatever reason that is more preferable then s_mov_b32 s0, 0 s_mov_b32 s1, 0 Anyway, this is not an issue with selecting ballot. Following example has the same issue: define amdgpu_cs i64 @si_fold_constants_i64() { %x = add i64 0, 0 ret i64 %x }

Harbormaster completed remote builds in B63726: Diff 276992.Jul 10 2020, 5:32 AM

arsenm accepted this revision.Jul 10 2020, 10:34 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1053–1054	I think something probably went wrong here due to us not trying to do anything resembling optimization during/after RegBankSelect. When we do that, we can probably remove a lot of these
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
12–13	I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to workaround it in the selector

This revision is now accepted and ready to land.Jul 10 2020, 10:34 AM

Closed by commit rGce23e54162ed: [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot (authored by mbrkusanin). · Explain WhyJul 13 2020, 3:17 AM

This revision was automatically updated to reflect the committed changes.

mbrkusanin marked an inline comment as done.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUInstructionSelector.h

1 line

AMDGPUInstructionSelector.cpp

35 lines

AMDGPURegisterBankInfo.cpp

8 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.ballot.i32.ll

77 lines

llvm.amdgcn.ballot.i64.ll

73 lines

llvm.amdgcn.ballot.i32.ll

64 lines

llvm.amdgcn.ballot.i64.ll

68 lines

Diff 276992

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	private:
bool selectG_BUILD_VECTOR_TRUNC(MachineInstr &I) const;		bool selectG_BUILD_VECTOR_TRUNC(MachineInstr &I) const;
bool selectG_PTR_ADD(MachineInstr &I) const;		bool selectG_PTR_ADD(MachineInstr &I) const;
bool selectG_IMPLICIT_DEF(MachineInstr &I) const;		bool selectG_IMPLICIT_DEF(MachineInstr &I) const;
bool selectG_INSERT(MachineInstr &I) const;		bool selectG_INSERT(MachineInstr &I) const;

bool selectInterpP1F16(MachineInstr &MI) const;		bool selectInterpP1F16(MachineInstr &MI) const;
bool selectDivScale(MachineInstr &MI) const;		bool selectDivScale(MachineInstr &MI) const;
bool selectIntrinsicIcmp(MachineInstr &MI) const;		bool selectIntrinsicIcmp(MachineInstr &MI) const;
		bool selectBallot(MachineInstr &I) const;
bool selectG_INTRINSIC(MachineInstr &I) const;		bool selectG_INTRINSIC(MachineInstr &I) const;

bool selectEndCfIntrinsic(MachineInstr &MI) const;		bool selectEndCfIntrinsic(MachineInstr &MI) const;
bool selectDSOrderedIntrinsic(MachineInstr &MI, Intrinsic::ID IID) const;		bool selectDSOrderedIntrinsic(MachineInstr &MI, Intrinsic::ID IID) const;
bool selectDSGWSIntrinsic(MachineInstr &MI, Intrinsic::ID IID) const;		bool selectDSGWSIntrinsic(MachineInstr &MI, Intrinsic::ID IID) const;
bool selectDSAppendConsume(MachineInstr &MI, bool IsAppend) const;		bool selectDSAppendConsume(MachineInstr &MI, bool IsAppend) const;

bool selectImageIntrinsic(MachineInstr &MI,		bool selectImageIntrinsic(MachineInstr &MI,
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 885 Lines • ▼ Show 20 Lines	bool AMDGPUInstructionSelector::selectG_INTRINSIC(MachineInstr &I) const {
case Intrinsic::amdgcn_softwqm:		case Intrinsic::amdgcn_softwqm:
return constrainCopyLikeIntrin(I, AMDGPU::SOFT_WQM);		return constrainCopyLikeIntrin(I, AMDGPU::SOFT_WQM);
case Intrinsic::amdgcn_wwm:		case Intrinsic::amdgcn_wwm:
return constrainCopyLikeIntrin(I, AMDGPU::WWM);		return constrainCopyLikeIntrin(I, AMDGPU::WWM);
case Intrinsic::amdgcn_div_scale:		case Intrinsic::amdgcn_div_scale:
return selectDivScale(I);		return selectDivScale(I);
case Intrinsic::amdgcn_icmp:		case Intrinsic::amdgcn_icmp:
return selectIntrinsicIcmp(I);		return selectIntrinsicIcmp(I);
		case Intrinsic::amdgcn_ballot:
		return selectBallot(I);
default:		default:
return selectImpl(I, *CoverageInfo);		return selectImpl(I, *CoverageInfo);
}		}
}		}

static int getV_CMPOpcode(CmpInst::Predicate P, unsigned Size) {		static int getV_CMPOpcode(CmpInst::Predicate P, unsigned Size) {
if (Size != 32 && Size != 64)		if (Size != 32 && Size != 64)
return -1;		return -1;
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	MachineInstr ICmp = BuildMI(BB, &I, DL, TII.get(Opcode), Dst)
.add(I.getOperand(3));		.add(I.getOperand(3));
RBI.constrainGenericRegister(ICmp->getOperand(0).getReg(), *TRI.getBoolRC(),		RBI.constrainGenericRegister(ICmp->getOperand(0).getReg(), *TRI.getBoolRC(),
*MRI);		*MRI);
bool Ret = constrainSelectedInstRegOperands(*ICmp, TII, TRI, RBI);		bool Ret = constrainSelectedInstRegOperands(*ICmp, TII, TRI, RBI);
I.eraseFromParent();		I.eraseFromParent();
return Ret;		return Ret;
}		}

		bool AMDGPUInstructionSelector::selectBallot(MachineInstr &I) const {
		MachineBasicBlock *BB = I.getParent();
		const DebugLoc &DL = I.getDebugLoc();
		Register DstReg = I.getOperand(0).getReg();
		const unsigned Size = MRI->getType(DstReg).getSizeInBits();
		const bool Is64 = Size == 64;

		if (Size != STI.getWavefrontSize())
		return false;

		Optional<ValueAndVReg> Arg =
		arsenmUnsubmitted Not Done Reply Inline Actions I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through arsenm: I think you want just regular getConstantVRegVal. I don't think you're getting much from the…
		mbrkusaninAuthorUnsubmitted Done Reply Inline Actions Unfortunately regular version fails to produce the value. mbrkusanin: Unfortunately regular version fails to produce the value.
		arsenmUnsubmitted Not Done Reply Inline Actions I think something probably went wrong here due to us not trying to do anything resembling optimization during/after RegBankSelect. When we do that, we can probably remove a lot of these arsenm: I think something probably went wrong here due to us not trying to do anything resembling…
		getConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI, true);

		if (Arg.hasValue()) {
		const int64_t Value = Arg.getValue().Value;
		if (Value == 0) {
		arsenmUnsubmitted Not Done Reply Inline Actions You want getConstantVRegVal instead of looking through a copy and checking for G_CONSTANT arsenm: You want getConstantVRegVal instead of looking through a copy and checking for G_CONSTANT
		arsenmUnsubmitted Not Done Reply Inline Actions This would need to be an S_MOV_B64 for wave 64? arsenm: This would need to be an S_MOV_B64 for wave 64?
		unsigned Opcode = Is64 ? AMDGPU::S_MOV_B64 : AMDGPU::S_MOV_B32;
		BuildMI(*BB, &I, DL, TII.get(Opcode), DstReg).addImm(0);
		} else if (Value == -1) { // all ones
		arsenmUnsubmitted Not Done Reply Inline Actions This doesn't make any sense; there's no reason to ever use the VOP3 encoded form of v_mov_b32. It's nota 64-bit move. This also returns a scalar value arsenm: This doesn't make any sense; there's no reason to ever use the VOP3 encoded form of v_mov_b32.
		Register SrcReg = (Size == 64) ? AMDGPU::EXEC : AMDGPU::EXEC_LO;
		const unsigned SubReg = Is64 ? AMDGPU::sub0_sub1 : AMDGPU::sub0;
		arsenmUnsubmitted Not Done Reply Inline Actions This should be unreachable code (however, the verifier doesn't check intrinsic operand types so I guess you have to leave this) arsenm: This should be unreachable code (however, the verifier doesn't check intrinsic operand types so…
		BuildMI(*BB, &I, DL, TII.get(AMDGPU::COPY), DstReg).addReg(SrcReg, 0, SubReg);
		arsenmUnsubmitted Not Done Reply Inline Actions Register arsenm: Register
		} else
		return false;
		} else {
		Register SrcReg = I.getOperand(2).getReg();
		BuildMI(*BB, &I, DL, TII.get(AMDGPU::COPY), DstReg).addReg(SrcReg);
		}

		I.eraseFromParent();
		return true;
		}

bool AMDGPUInstructionSelector::selectEndCfIntrinsic(MachineInstr &MI) const {		bool AMDGPUInstructionSelector::selectEndCfIntrinsic(MachineInstr &MI) const {
// FIXME: Manually selecting to avoid dealiing with the SReg_1 trick		// FIXME: Manually selecting to avoid dealiing with the SReg_1 trick
// SelectionDAG uses for wave32 vs wave64.		// SelectionDAG uses for wave32 vs wave64.
MachineBasicBlock *BB = MI.getParent();		MachineBasicBlock *BB = MI.getParent();
BuildMI(*BB, &MI, MI.getDebugLoc(), TII.get(AMDGPU::SI_END_CF))		BuildMI(*BB, &MI, MI.getDebugLoc(), TII.get(AMDGPU::SI_END_CF))
.add(MI.getOperand(1));		.add(MI.getOperand(1));

Register Reg = MI.getOperand(1).getReg();		Register Reg = MI.getOperand(1).getReg();
▲ Show 20 Lines • Show All 2,767 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 2,983 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_writelane: {
assert(OpdMapper.getVRegs(2).empty());		assert(OpdMapper.getVRegs(2).empty());
assert(OpdMapper.getVRegs(3).empty());		assert(OpdMapper.getVRegs(3).empty());

substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val		substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
constrainOpWithReadfirstlane(MI, MRI, 2); // Source value		constrainOpWithReadfirstlane(MI, MRI, 2); // Source value
constrainOpWithReadfirstlane(MI, MRI, 3); // Index		constrainOpWithReadfirstlane(MI, MRI, 3); // Index
return;		return;
}		}
		case Intrinsic::amdgcn_ballot:
case Intrinsic::amdgcn_interp_p1:		case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p2:		case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_interp_mov:		case Intrinsic::amdgcn_interp_mov:
case Intrinsic::amdgcn_interp_p1_f16:		case Intrinsic::amdgcn_interp_p1_f16:
case Intrinsic::amdgcn_interp_p2_f16: {		case Intrinsic::amdgcn_interp_p2_f16: {
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);

// Readlane for m0 value, which is always the last operand.		// Readlane for m0 value, which is always the last operand.
▲ Show 20 Lines • Show All 1,155 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_interp_p2_f16: {
for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)		for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);		OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);

// Must be SGPR, but we must take whatever the original bank is and fix it		// Must be SGPR, but we must take whatever the original bank is and fix it
// later.		// later.
OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);		OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
break;		break;
}		}
		case Intrinsic::amdgcn_ballot: {
		unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
		unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
		arsenmUnsubmitted Not Done Reply Inline Actions This returns an SGPR value arsenm: This returns an SGPR value
		OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
		break;
		}
}		}
break;		break;
}		}
case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:		case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE: {		case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE: {
auto IntrID = MI.getIntrinsicID();		auto IntrID = MI.getIntrinsicID();
const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);		const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");		assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -global-isel < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ballot.i32(i1)

				; Test ballot(0)

				define amdgpu_cs i32 @constant_false() {
				; CHECK-LABEL: constant_false:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: ; return to shader part epilog
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 0)
				ret i32 %ballot
				}

				; Test ballot(1)

				define amdgpu_cs i32 @constant_true() {
				; CHECK-LABEL: constant_true:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_mov_b32 s0, exec_lo
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: ; return to shader part epilog
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 1)
				ret i32 %ballot
				}

				; Test ballot of a non-comparison operation

				define amdgpu_cs i32 @non_compare(i32 %x) {
				; CHECK-LABEL: non_compare:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: v_cmp_ne_u32_e64 s0, 0, v0
				; CHECK-NEXT: ; return to shader part epilog
				%trunc = trunc i32 %x to i1
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %trunc)
				ret i32 %ballot
				}

				; Test ballot of comparisons

				define amdgpu_cs i32 @compare_ints(i32 %x, i32 %y) {
				; CHECK-LABEL: compare_ints:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_cmp_eq_u32_e64 s0, v0, v1
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: ; return to shader part epilog
				%cmp = icmp eq i32 %x, %y
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
				ret i32 %ballot
				}

				define amdgpu_cs i32 @compare_int_with_constant(i32 %x) {
				; CHECK-LABEL: compare_int_with_constant:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_cmp_le_i32_e64 s0, 0x63, v0
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: ; return to shader part epilog
				%cmp = icmp sge i32 %x, 99
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
				ret i32 %ballot
				}

				define amdgpu_cs i32 @compare_floats(float %x, float %y) {
				; CHECK-LABEL: compare_floats:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_cmp_gt_f32_e64 s0, v0, v1
				; CHECK-NEXT: ; implicit-def: $vcc_hi
				; CHECK-NEXT: ; return to shader part epilog
				%cmp = fcmp ogt float %x, %y
				%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
				ret i32 %ballot
				}

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -global-isel < %s \| FileCheck %s

				declare i64 @llvm.amdgcn.ballot.i64(i1)

				; Test ballot(0)

				define amdgpu_cs i64 @constant_false() {
				; CHECK-LABEL: constant_false:
				arsenmUnsubmitted Done Reply Inline Actions Can you give the test names more desrciptive names, like constant_false, constant_true? Also the function returns should use SGPRs, so switch to shader calling conventions? arsenm: Can you give the test names more desrciptive names, like constant_false, constant_true? Also…
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: s_mov_b32 s1, 0
				; CHECK-NEXT: ; return to shader part epilog
				arsenmUnsubmitted Not Done Reply Inline Actions This can be one s_mov_b64 arsenm: This can be one s_mov_b64
				mbrkusaninAuthorUnsubmitted Done Reply Inline Actions It can, but SIFoldOperands will not let that happen. From: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = COPY %10.sub0:sreg_64 %4:sreg_32 = COPY %10.sub1:sreg_64 plus some instructions that use %3, %4 but will eventually be removed. SIFoldOperands will produce: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = S_MOV_B32 0 %4:sreg_32 = S_MOV_B32 0 ... which makes the first instruction dead and in the end we're left with two S_MOV_B32. For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands. Alternatively we can produce v_cmp_ne_u32_e64 s[0:1], 0, 0 if for whatever reason that is more preferable then s_mov_b32 s0, 0 s_mov_b32 s1, 0 Anyway, this is not an issue with selecting ballot. Following example has the same issue: define amdgpu_cs i64 @si_fold_constants_i64() { %x = add i64 0, 0 ret i64 %x } mbrkusanin: It can, but SIFoldOperands will not let that happen. From: %10:sreg_64 = S_MOV_B64 0 %3…
				arsenmUnsubmitted Not Done Reply Inline Actions I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to workaround it in the selector arsenm: I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to…
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 0)
				ret i64 %ballot
				}

				; Test ballot(1)

				define amdgpu_cs i64 @constant_true() {
				; CHECK-LABEL: constant_true:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_mov_b64 s[0:1], exec
				; CHECK-NEXT: ; return to shader part epilog
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 1)
				arsenmUnsubmitted Done Reply Inline Actions One s_mov_b64 arsenm: One s_mov_b64
				ret i64 %ballot
				}

				; Test ballot of a non-comparison operation

				define amdgpu_cs i64 @non_compare(i32 %x) {
				; CHECK-LABEL: non_compare:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
				; CHECK-NEXT: v_cmp_ne_u32_e64 s[0:1], 0, v0
				; CHECK-NEXT: ; return to shader part epilog
				%trunc = trunc i32 %x to i1
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
				ret i64 %ballot
				}

				; Test ballot of comparisons

				define amdgpu_cs i64 @compare_ints(i32 %x, i32 %y) {
				; CHECK-LABEL: compare_ints:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_cmp_eq_u32_e64 s[0:1], v0, v1
				; CHECK-NEXT: ; return to shader part epilog
				%cmp = icmp eq i32 %x, %y
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				ret i64 %ballot
				}

				define amdgpu_cs i64 @compare_int_with_constant(i32 %x) {
				; CHECK-LABEL: compare_int_with_constant:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_mov_b32_e32 v1, 0x63
				; CHECK-NEXT: v_cmp_ge_i32_e64 s[0:1], v0, v1
				; CHECK-NEXT: ; return to shader part epilog
				%cmp = icmp sge i32 %x, 99
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				ret i64 %ballot
				}

				define amdgpu_cs i64 @compare_floats(float %x, float %y) {
				; CHECK-LABEL: compare_floats:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: v_cmp_gt_f32_e64 s[0:1], v0, v1
				; CHECK-NEXT: ; return to shader part epilog
				%cmp = fcmp ogt float %x, %y
				%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
				ret i64 %ballot
				}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i32.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 < %s \| FileCheck %s

	declare i32 @llvm.amdgcn.ballot.i32(i1)			declare i32 @llvm.amdgcn.ballot.i32(i1)

	; Test ballot(0)			; Test ballot(0)

	define i32 @test0() {			define amdgpu_cs i32 @constant_false() {
	; CHECK-LABEL: test0:			; CHECK-LABEL: constant_false:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: ; implicit-def: $vcc_hi			; CHECK-NEXT: ; implicit-def: $vcc_hi
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: ; return to shader part epilog
	%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 0)			%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 0)
	ret i32 %ballot			ret i32 %ballot
	}			}

	; Test ballot(1)			; Test ballot(1)

	define i32 @test1() {			define amdgpu_cs i32 @constant_true() {
	; CHECK-LABEL: test1:			; CHECK-LABEL: constant_true:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_mov_b32 s0, exec_lo
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_mov_b32_e32 v0, exec_lo
	; CHECK-NEXT: ; implicit-def: $vcc_hi			; CHECK-NEXT: ; implicit-def: $vcc_hi
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: ; return to shader part epilog
	%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 1)			%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 1)
	ret i32 %ballot			ret i32 %ballot
	}			}

	; Test ballot of a non-comparison operation			; Test ballot of a non-comparison operation

	define i32 @test2(i32 %x) {			define amdgpu_cs i32 @non_compare(i32 %x) {
	; CHECK-LABEL: test2:			; CHECK-LABEL: non_compare:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_and_b32_e32 v0, 1, v0			; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
	; CHECK-NEXT: ; implicit-def: $vcc_hi			; CHECK-NEXT: ; implicit-def: $vcc_hi
	; CHECK-NEXT: v_cmp_ne_u32_e64 s4, 0, v0			; CHECK-NEXT: v_cmp_ne_u32_e64 s0, 0, v0
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%trunc = trunc i32 %x to i1			%trunc = trunc i32 %x to i1
	%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %trunc)			%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %trunc)
	ret i32 %ballot			ret i32 %ballot
	}			}

	; Test ballot of comparisons			; Test ballot of comparisons

	define i32 @test3(i32 %x, i32 %y) {			define amdgpu_cs i32 @compare_ints(i32 %x, i32 %y) {
	; CHECK-LABEL: test3:			; CHECK-LABEL: compare_ints:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: v_cmp_eq_u32_e64 s0, v0, v1
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_cmp_eq_u32_e64 s4, v0, v1
	; CHECK-NEXT: ; implicit-def: $vcc_hi			; CHECK-NEXT: ; implicit-def: $vcc_hi
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%cmp = icmp eq i32 %x, %y			%cmp = icmp eq i32 %x, %y
	%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)			%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
	ret i32 %ballot			ret i32 %ballot
	}			}

	define i32 @test4(i32 %x) {			define amdgpu_cs i32 @compare_int_with_constant(i32 %x) {
	; CHECK-LABEL: test4:			; CHECK-LABEL: compare_int_with_constant:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: v_cmp_lt_i32_e64 s0, 0x62, v0
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_cmp_lt_i32_e64 s4, 0x62, v0
	; CHECK-NEXT: ; implicit-def: $vcc_hi			; CHECK-NEXT: ; implicit-def: $vcc_hi
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%cmp = icmp sge i32 %x, 99			%cmp = icmp sge i32 %x, 99
	%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)			%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
	ret i32 %ballot			ret i32 %ballot
	}			}

	define i32 @test5(float %x, float %y) {			define amdgpu_cs i32 @compare_floats(float %x, float %y) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: compare_floats:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: v_cmp_gt_f32_e64 s0, v0, v1
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_cmp_gt_f32_e64 s4, v0, v1
	; CHECK-NEXT: ; implicit-def: $vcc_hi			; CHECK-NEXT: ; implicit-def: $vcc_hi
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%cmp = fcmp ogt float %x, %y			%cmp = fcmp ogt float %x, %y
	%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)			%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 %cmp)
	ret i32 %ballot			ret i32 %ballot
	}			}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ballot.i64.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=gfx900 < %s \| FileCheck %s

	declare i64 @llvm.amdgcn.ballot.i64(i1)			declare i64 @llvm.amdgcn.ballot.i64(i1)

	; Test ballot(0)			; Test ballot(0)

	define i64 @test0() {			define amdgpu_cs i64 @constant_false() {
	; CHECK-LABEL: test0:			; CHECK-LABEL: constant_false:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: s_mov_b32 s1, 0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 0)			%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 0)
	ret i64 %ballot			ret i64 %ballot
	}			}

	; Test ballot(1)			; Test ballot(1)

	define i64 @test1() {			define amdgpu_cs i64 @constant_true() {
	; CHECK-LABEL: test1:			; CHECK-LABEL: constant_true:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_mov_b32 s0, exec_lo
	; CHECK-NEXT: v_mov_b32_e32 v0, exec_lo			; CHECK-NEXT: s_mov_b32 s1, exec_hi
	; CHECK-NEXT: v_mov_b32_e32 v1, exec_hi			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 1)			%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 1)
	ret i64 %ballot			ret i64 %ballot
	}			}

	; Test ballot of a non-comparison operation			; Test ballot of a non-comparison operation

	define i64 @test2(i32 %x) {			define amdgpu_cs i64 @non_compare(i32 %x) {
	; CHECK-LABEL: test2:			; CHECK-LABEL: non_compare:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_and_b32_e32 v0, 1, v0			; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
	; CHECK-NEXT: v_cmp_ne_u32_e64 s[4:5], 0, v0			; CHECK-NEXT: v_cmp_ne_u32_e64 s[0:1], 0, v0
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%trunc = trunc i32 %x to i1			%trunc = trunc i32 %x to i1
	%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)			%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %trunc)
	ret i64 %ballot			ret i64 %ballot
	}			}

	; Test ballot of comparisons			; Test ballot of comparisons

	define i64 @test3(i32 %x, i32 %y) {			define amdgpu_cs i64 @compare_ints(i32 %x, i32 %y) {
	; CHECK-LABEL: test3:			; CHECK-LABEL: compare_ints:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: v_cmp_eq_u32_e64 s[0:1], v0, v1
	; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], v0, v1			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%cmp = icmp eq i32 %x, %y			%cmp = icmp eq i32 %x, %y
	%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)			%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
	ret i64 %ballot			ret i64 %ballot
	}			}

	define i64 @test4(i32 %x) {			define amdgpu_cs i64 @compare_int_with_constant(i32 %x) {
	; CHECK-LABEL: test4:			; CHECK-LABEL: compare_int_with_constant:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_movk_i32 s0, 0x62
	; CHECK-NEXT: s_movk_i32 s4, 0x62			; CHECK-NEXT: v_cmp_lt_i32_e64 s[0:1], s0, v0
	; CHECK-NEXT: v_cmp_lt_i32_e64 s[4:5], s4, v0			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%cmp = icmp sge i32 %x, 99			%cmp = icmp sge i32 %x, 99
	%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)			%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
	ret i64 %ballot			ret i64 %ballot
	}			}

	define i64 @test5(float %x, float %y) {			define amdgpu_cs i64 @compare_floats(float %x, float %y) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: compare_floats:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: v_cmp_gt_f32_e64 s[0:1], v0, v1
	; CHECK-NEXT: v_cmp_gt_f32_e64 s[4:5], v0, v1			; CHECK-NEXT: ; return to shader part epilog
	; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: s_setpc_b64 s[30:31]
	%cmp = fcmp ogt float %x, %y			%cmp = fcmp ogt float %x, %y
	%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)			%ballot = call i64 @llvm.amdgcn.ballot.i64(i1 %cmp)
	ret i64 %ballot			ret i64 %ballot
	}			}