This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
ClosedPublic

Authored by mareko on Oct 4 2017, 8:04 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

mareko created this revision.Oct 4 2017, 8:04 AM
arsenm added inline comments.Oct 4 2017, 10:40 AM
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

speculatable + convergent is a strange combination. I think this is correct but I think we need to clarify what this means exactly. We have some problems from hoisting some convergent operations, so we should probably fix the langref to clarify how these interact.

lib/Target/AMDGPU/SIInstructions.td
1209 ↗(On Diff #117673)

You should be able to put this directly in the instruction definition pattern

test/CodeGen/AMDGPU/wqm.vote.ll
1 ↗(On Diff #117673)

Separate instcombine test and -enable-var-scope

mareko added inline comments.Oct 4 2017, 11:46 AM
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

I don't know if IntrConvergent is correct. The instruction operates on an SGPR pair (and therefore ignores control flow I think). Should I remove IntrConvergent from there?

test/CodeGen/AMDGPU/wqm.vote.ll
1 ↗(On Diff #117673)

Why -enable-var-scope?

arsenm added inline comments.Oct 4 2017, 1:42 PM
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

I don't think I know enough about wqm to be sure

test/CodeGen/AMDGPU/wqm.vote.ll
1 ↗(On Diff #117673)

Because it clears the filecheck variables at a check-label so they don't accidentally match in another function. It should be the default eventually, but right now there are a lot of broken tests.

mareko added inline comments.Oct 4 2017, 3:42 PM
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

For each group of 4 threads, if any of the threads passes true to wqm.vote, the function returns true for all 4 threads. For example, in hypothetical wave8:

00000000 -> 00000000
01000000 -> 11110000
00000011 -> 00001111
00101000 -> 11111111
10110111 -> 11111111

arsenm added inline comments.Oct 4 2017, 5:43 PM
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

Does it depend on exec? What do inactive lanes report?

mareko added inline comments.Oct 5 2017, 10:47 AM
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

Inactive lanes should report 1 if there are active lanes with 1.

nhaehnle edited edge metadata.Oct 10 2017, 2:28 AM

Apart from Matt's comments, this looks good to me.

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #117673)

I believe convergent is needed here. This is similar to the ballot instructions.

Consider:

%result = call i1 @llvm.amdgcn.wqm.vote(%vote)
if (cond) {
   // use %result
}

Transforming this to:

if (cond) {
  %result = call i1 @llvm.amdgcn.wqm.vote(%vote)
   // use %result
}

is incorrect: if only one thread of a quad has cond == true, but only one of its neighbors has vote == true, then the value of result will be changed by the transformation.

mareko updated this revision to Diff 118935.Oct 13 2017, 10:01 AM

Address feedback.

arsenm added inline comments.Oct 13 2017, 6:27 PM
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
1563 ↗(On Diff #118935)

Probably should check undef too

mareko updated this revision to Diff 119082.Oct 15 2017, 7:47 AM

also test undef.

mareko updated this revision to Diff 119083.Oct 15 2017, 8:09 AM

fix the undef test.

cwabbott added inline comments.
include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #118935)

I don't think that this should be Speculatable. Consider something like:

bool cond = gl_LocalInvocationID & 1; // or some other non-uniform condition
if (cond) {
    value = @llvm.amdgcn.wqm.vote(!cond);
}

value will always be false, because the active threads are the ones with the false condition, but it would return true if we hoisted it out of the if-statement. Speculatable allows such a transformation to happen.

(Btw, for the vote intrinsics, we currently have a nasty workaround in Mesa involving inline asm for exactly this reason, but from my reading of the langref and experiments with actual optimizations, this shouldn't be necessary as long as we don't set speculatable.)

Also, even though it's implemented with the WQM instruction, it might not be best to call it amdgcn.wqm.vote. There's already a completely different concept of WQM used by e.g. llvm.amdgcn.wqm, and it's best not to confuse those two things. For example, someone might think that this intrinsic implicitly enables WQM, when it really doesn't. Maybe llvm.amdgcn.quad.vote would be a better name?

arsenm edited edge metadata.Oct 16 2017, 2:05 PM

Should the name just be WQM since it seems to map directly to the WQM instruction?

My understanding is that the existing llvm.amdgcn.wqm() intrinsic annotates "expression trees" for the WQM pass and might not insert any instructions at the call site. This new wqm.vote intrinsic does translate to S_WQM_B{wavesize} directly.

nhaehnle accepted this revision.Oct 23 2017, 6:59 AM

I agree the naming of the intrinsics is a bit unfortunate, but it is what it is, and it's not the end of the world. Please remove speculatable. Other than that, LGTM.

include/llvm/IR/IntrinsicsAMDGPU.td
753 ↗(On Diff #118935)

Hmm, agreed that speculatable should probably be removed at least for now.

Unfortunately, the nasty Mesa hack is necessary because LLVM IR currently does not have a coconvergent or anticonvergent attribute.

The problem is examples like:

if (cond) {
  mask = ballot(other_cond);
} else {
  mask = ballot(other_cond);
}
// now use readlane intrinsics

Changing that to

mask = ballot(other_cond)
// now use readlane intrinsics

is perfectly legal as far as LLVM IR is concerned. Speculatable doesn't play into it, because both branches literally have the same sequence of instructions.

This revision is now accepted and ready to land.Oct 23 2017, 6:59 AM
This revision was automatically updated to reflect the committed changes.