Download Raw Diff

Details

Reviewers

arsenm
nhaehnle

Commits

rG2114fc3bcba7: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
rL316426: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic

Diff Detail

Repository: rL LLVM

Event Timeline

mareko created this revision.Oct 4 2017, 8:04 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptOct 4 2017, 8:04 AM

arsenm added inline comments.Oct 4 2017, 10:40 AM

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	speculatable + convergent is a strange combination. I think this is correct but I think we need to clarify what this means exactly. We have some problems from hoisting some convergent operations, so we should probably fix the langref to clarify how these interact.
lib/Target/AMDGPU/SIInstructions.td
1209 ↗	(On Diff #117673)	You should be able to put this directly in the instruction definition pattern
test/CodeGen/AMDGPU/wqm.vote.ll
1 ↗	(On Diff #117673)	Separate instcombine test and -enable-var-scope

mareko added inline comments.Oct 4 2017, 11:46 AM

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	I don't know if IntrConvergent is correct. The instruction operates on an SGPR pair (and therefore ignores control flow I think). Should I remove IntrConvergent from there?
test/CodeGen/AMDGPU/wqm.vote.ll
1 ↗	(On Diff #117673)	Why -enable-var-scope?

arsenm added inline comments.Oct 4 2017, 1:42 PM

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	I don't think I know enough about wqm to be sure
test/CodeGen/AMDGPU/wqm.vote.ll
1 ↗	(On Diff #117673)	Because it clears the filecheck variables at a check-label so they don't accidentally match in another function. It should be the default eventually, but right now there are a lot of broken tests.

mareko added inline comments.Oct 4 2017, 3:42 PM

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	For each group of 4 threads, if any of the threads passes true to wqm.vote, the function returns true for all 4 threads. For example, in hypothetical wave8: 00000000 -> 00000000 01000000 -> 11110000 00000011 -> 00001111 00101000 -> 11111111 10110111 -> 11111111

arsenm added inline comments.Oct 4 2017, 5:43 PM

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	Does it depend on exec? What do inactive lanes report?

mareko added inline comments.Oct 5 2017, 10:47 AM

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	Inactive lanes should report 1 if there are active lanes with 1.

Apart from Matt's comments, this looks good to me.

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #117673)	I believe convergent is needed here. This is similar to the ballot instructions. Consider: %result = call i1 @llvm.amdgcn.wqm.vote(%vote) if (cond) { // use %result } Transforming this to: if (cond) { %result = call i1 @llvm.amdgcn.wqm.vote(%vote) // use %result } is incorrect: if only one thread of a quad has `cond == true`, but only one of its neighbors has `vote == true`, then the value of `result` will be changed by the transformation.

Address feedback.

Harbormaster completed remote builds in B11143: Diff 118935.Oct 13 2017, 10:01 AM

arsenm added inline comments.Oct 13 2017, 6:27 PM

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
1563 ↗	(On Diff #118935)	Probably should check undef too

also test undef.

fix the undef test.

cwabbott added a subscriber: cwabbott.Oct 16 2017, 12:39 PM

cwabbott added inline comments.

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #118935)	I don't think that this should be Speculatable. Consider something like: bool cond = gl_LocalInvocationID & 1; // or some other non-uniform condition if (cond) { value = @llvm.amdgcn.wqm.vote(!cond); } `value` will always be false, because the active threads are the ones with the false condition, but it would return true if we hoisted it out of the if-statement. Speculatable allows such a transformation to happen. (Btw, for the vote intrinsics, we currently have a nasty workaround in Mesa involving inline asm for exactly this reason, but from my reading of the langref and experiments with actual optimizations, this shouldn't be necessary as long as we don't set speculatable.) Also, even though it's implemented with the WQM instruction, it might not be best to call it amdgcn.wqm.vote. There's already a completely different concept of WQM used by e.g. llvm.amdgcn.wqm, and it's best not to confuse those two things. For example, someone might think that this intrinsic implicitly enables WQM, when it really doesn't. Maybe llvm.amdgcn.quad.vote would be a better name?

Should the name just be WQM since it seems to map directly to the WQM instruction?

My understanding is that the existing llvm.amdgcn.wqm() intrinsic annotates "expression trees" for the WQM pass and might not insert any instructions at the call site. This new wqm.vote intrinsic does translate to S_WQM_B{wavesize} directly.

I agree the naming of the intrinsics is a bit unfortunate, but it is what it is, and it's not the end of the world. Please remove speculatable. Other than that, LGTM.

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗	(On Diff #118935)	Hmm, agreed that speculatable should probably be removed at least for now. Unfortunately, the nasty Mesa hack is necessary because LLVM IR currently does not have a `coconvergent` or `anticonvergent` attribute. The problem is examples like: if (cond) { mask = ballot(other_cond); } else { mask = ballot(other_cond); } // now use readlane intrinsics Changing that to mask = ballot(other_cond) // now use readlane intrinsics is perfectly legal as far as LLVM IR is concerned. Speculatable doesn't play into it, because both branches literally have the same sequence of instructions.

This revision is now accepted and ready to land.Oct 23 2017, 6:59 AM

Closed by commit rL316426: AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic (authored by mareko). · Explain WhyOct 24 2017, 3:27 AM

This revision was automatically updated to reflect the committed changes.

Diff 120031

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 741 Lines • ▼ Show 20 Lines


	// Copies the source value to the destination value, with the guarantee that			// Copies the source value to the destination value, with the guarantee that
	// the source value is computed as if the entire program were executed in WQM.			// the source value is computed as if the entire program were executed in WQM.
	def int_amdgcn_wqm : Intrinsic<[llvm_any_ty],			def int_amdgcn_wqm : Intrinsic<[llvm_any_ty],
	[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]			[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]
	>;			>;

				// Return true if at least one thread within the pixel quad passes true into
				// the function.
				def int_amdgcn_wqm_vote : Intrinsic<[llvm_i1_ty],
				[llvm_i1_ty], [IntrNoMem, IntrConvergent]
				>;

	// Copies the active channels of the source value to the destination value,			// Copies the active channels of the source value to the destination value,
	// with the guarantee that the source value is computed as if the entire			// with the guarantee that the source value is computed as if the entire
	// program were executed in Whole Wavefront Mode, i.e. with all channels			// program were executed in Whole Wavefront Mode, i.e. with all channels
	// enabled, with a few exceptions: - Phi nodes with require WWM return an			// enabled, with a few exceptions: - Phi nodes with require WWM return an
	// undefined value.			// undefined value.
	def int_amdgcn_wwm : Intrinsic<[llvm_any_ty],			def int_amdgcn_wwm : Intrinsic<[llvm_any_ty],
	[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]			[LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]
	>;			>;
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	let Defs = [SCC] in {
def S_NOT_B32 : SOP1_32 <"s_not_b32",		def S_NOT_B32 : SOP1_32 <"s_not_b32",
[(set i32:$sdst, (not i32:$src0))]		[(set i32:$sdst, (not i32:$src0))]
>;		>;

def S_NOT_B64 : SOP1_64 <"s_not_b64",		def S_NOT_B64 : SOP1_64 <"s_not_b64",
[(set i64:$sdst, (not i64:$src0))]		[(set i64:$sdst, (not i64:$src0))]
>;		>;
def S_WQM_B32 : SOP1_32 <"s_wqm_b32">;		def S_WQM_B32 : SOP1_32 <"s_wqm_b32">;
def S_WQM_B64 : SOP1_64 <"s_wqm_b64">;		def S_WQM_B64 : SOP1_64 <"s_wqm_b64",
		[(set i1:$sdst, (int_amdgcn_wqm_vote i1:$src0))]
		>;
} // End Defs = [SCC]		} // End Defs = [SCC]


def S_BREV_B32 : SOP1_32 <"s_brev_b32",		def S_BREV_B32 : SOP1_32 <"s_brev_b32",
[(set i32:$sdst, (bitreverse i32:$src0))]		[(set i32:$sdst, (bitreverse i32:$src0))]
>;		>;
def S_BREV_B64 : SOP1_64 <"s_brev_b64">;		def S_BREV_B64 : SOP1_64 <"s_brev_b64">;

▲ Show 20 Lines • Show All 1,162 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 3,526 Lines • ▼ Show 20 Lines	if (match(Src1, m_Zero()) &&
ConstantInt::get(CC->getType(), SrcPred) };		ConstantInt::get(CC->getType(), SrcPred) };
CallInst *NewCall = Builder.CreateCall(NewF, Args);		CallInst *NewCall = Builder.CreateCall(NewF, Args);
NewCall->takeName(II);		NewCall->takeName(II);
return replaceInstUsesWith(*II, NewCall);		return replaceInstUsesWith(*II, NewCall);
}		}

break;		break;
}		}
		case Intrinsic::amdgcn_wqm_vote: {
		// wqm_vote is identity when the argument is constant.
		if (!isa<Constant>(II->getArgOperand(0)))
		break;

		return replaceInstUsesWith(*II, II->getArgOperand(0));
		}
case Intrinsic::stackrestore: {		case Intrinsic::stackrestore: {
// If the save is right next to the restore, remove the restore. This can		// If the save is right next to the restore, remove the restore. This can
// happen when variable allocas are DCE'd.		// happen when variable allocas are DCE'd.
if (IntrinsicInst *SS = dyn_cast<IntrinsicInst>(II->getArgOperand(0))) {		if (IntrinsicInst *SS = dyn_cast<IntrinsicInst>(II->getArgOperand(0))) {
if (SS->getIntrinsicID() == Intrinsic::stacksave) {		if (SS->getIntrinsicID() == Intrinsic::stacksave) {
if (&*++SS->getIterator() == II)		if (&*++SS->getIterator() == II)
return eraseInstFromFunction(CI);		return eraseInstFromFunction(CI);
}		}
▲ Show 20 Lines • Show All 846 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.vote.ll

				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=CHECK %s

				;CHECK-LABEL: {{^}}ret:
				;CHECK: v_cmp_eq_u32_e32 [[CMP:[^,]+]], v0, v1
				;CHECK: s_wqm_b64 [[WQM:[^,]+]], [[CMP]]
				;CHECK: v_cndmask_b32_e64 v0, 0, 1.0, [[WQM]]
				define amdgpu_ps float @ret(i32 %v0, i32 %v1) #1 {
				main_body:
				%c = icmp eq i32 %v0, %v1
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 %c)
				%r = select i1 %w, float 1.0, float 0.0
				ret float %r
				}

				;CHECK-LABEL: {{^}}true:
				;CHECK: s_wqm_b64
				define amdgpu_ps float @true() #1 {
				main_body:
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 true)
				%r = select i1 %w, float 1.0, float 0.0
				ret float %r
				}

				;CHECK-LABEL: {{^}}false:
				;CHECK: s_wqm_b64
				define amdgpu_ps float @false() #1 {
				main_body:
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 false)
				%r = select i1 %w, float 1.0, float 0.0
				ret float %r
				}

				;CHECK-LABEL: {{^}}kill:
				;CHECK: v_cmp_eq_u32_e32 [[CMP:[^,]+]], v0, v1
				;CHECK: s_wqm_b64 [[WQM:[^,]+]], [[CMP]]
				;FIXME: This could just be: s_and_b64 exec, exec, [[WQM]]
				;CHECK: v_cndmask_b32_e64 [[KILL:[^,]+]], -1.0, 1.0, [[WQM]]
				;CHECK: v_cmpx_le_f32_e32 {{[^,]+}}, 0, [[KILL]]
				;CHECK: s_endpgm
				define amdgpu_ps void @kill(i32 %v0, i32 %v1) #1 {
				main_body:
				%c = icmp eq i32 %v0, %v1
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 %c)
				%r = select i1 %w, float 1.0, float -1.0
				call void @llvm.AMDGPU.kill(float %r)
				ret void
				}

				declare void @llvm.AMDGPU.kill(float) #1
				declare i1 @llvm.amdgcn.wqm.vote(i1)

				attributes #1 = { nounwind }

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	Show First 20 Lines • Show All 1,531 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: @fcmp_constant_to_rhs_olt(			; CHECK-LABEL: @fcmp_constant_to_rhs_olt(
	; CHECK: %result = call i64 @llvm.amdgcn.fcmp.f32(float %x, float 4.000000e+00, i32 2)			; CHECK: %result = call i64 @llvm.amdgcn.fcmp.f32(float %x, float 4.000000e+00, i32 2)
	define i64 @fcmp_constant_to_rhs_olt(float %x) {			define i64 @fcmp_constant_to_rhs_olt(float %x) {
	%result = call i64 @llvm.amdgcn.fcmp.f32(float 4.0, float %x, i32 4)			%result = call i64 @llvm.amdgcn.fcmp.f32(float 4.0, float %x, i32 4)
	ret i64 %result			ret i64 %result
	}			}

				; --------------------------------------------------------------------
				; llvm.amdgcn.wqm.vote
				; --------------------------------------------------------------------

				declare i1 @llvm.amdgcn.wqm.vote(i1)

				; CHECK-LABEL: @wqm_vote_true(
				; CHECK: ret float 1.000000e+00
				define float @wqm_vote_true() {
				main_body:
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 true)
				%r = select i1 %w, float 1.0, float 0.0
				ret float %r
				}

				; CHECK-LABEL: @wqm_vote_false(
				; CHECK: ret float 0.000000e+00
				define float @wqm_vote_false() {
				main_body:
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 false)
				%r = select i1 %w, float 1.0, float 0.0
				ret float %r
				}

				; CHECK-LABEL: @wqm_vote_undef(
				; CHECK: ret float 0.000000e+00
				define float @wqm_vote_undef() {
				main_body:
				%w = call i1 @llvm.amdgcn.wqm.vote(i1 undef)
				%r = select i1 %w, float 1.0, float 0.0
				ret float %r
				}

	; CHECK: attributes #5 = { convergent }			; CHECK: attributes #5 = { convergent }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120031

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.vote.ll

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add llvm.amdgcn.wqm.vote intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120031

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.vote.ll

llvm/trunk/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
ClosedPublic