Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | speculatable + convergent is a strange combination. I think this is correct but I think we need to clarify what this means exactly. We have some problems from hoisting some convergent operations, so we should probably fix the langref to clarify how these interact. |
lib/Target/AMDGPU/SIInstructions.td | ||
1209 ↗ | (On Diff #117673) | You should be able to put this directly in the instruction definition pattern |
test/CodeGen/AMDGPU/wqm.vote.ll | ||
1 ↗ | (On Diff #117673) | Separate instcombine test and -enable-var-scope |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | I don't know if IntrConvergent is correct. The instruction operates on an SGPR pair (and therefore ignores control flow I think). Should I remove IntrConvergent from there? |
test/CodeGen/AMDGPU/wqm.vote.ll | ||
1 ↗ | (On Diff #117673) | Why -enable-var-scope? |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | I don't think I know enough about wqm to be sure |
test/CodeGen/AMDGPU/wqm.vote.ll | ||
1 ↗ | (On Diff #117673) | Because it clears the filecheck variables at a check-label so they don't accidentally match in another function. It should be the default eventually, but right now there are a lot of broken tests. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | For each group of 4 threads, if any of the threads passes true to wqm.vote, the function returns true for all 4 threads. For example, in hypothetical wave8: 00000000 -> 00000000 |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | Does it depend on exec? What do inactive lanes report? |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | Inactive lanes should report 1 if there are active lanes with 1. |
Apart from Matt's comments, this looks good to me.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #117673) | I believe convergent is needed here. This is similar to the ballot instructions. Consider: %result = call i1 @llvm.amdgcn.wqm.vote(%vote) if (cond) { // use %result } Transforming this to: if (cond) { %result = call i1 @llvm.amdgcn.wqm.vote(%vote) // use %result } is incorrect: if only one thread of a quad has cond == true, but only one of its neighbors has vote == true, then the value of result will be changed by the transformation. |
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll | ||
---|---|---|
1563 ↗ | (On Diff #118935) | Probably should check undef too |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #118935) | I don't think that this should be Speculatable. Consider something like: bool cond = gl_LocalInvocationID & 1; // or some other non-uniform condition if (cond) { value = @llvm.amdgcn.wqm.vote(!cond); } value will always be false, because the active threads are the ones with the false condition, but it would return true if we hoisted it out of the if-statement. Speculatable allows such a transformation to happen. (Btw, for the vote intrinsics, we currently have a nasty workaround in Mesa involving inline asm for exactly this reason, but from my reading of the langref and experiments with actual optimizations, this shouldn't be necessary as long as we don't set speculatable.) Also, even though it's implemented with the WQM instruction, it might not be best to call it amdgcn.wqm.vote. There's already a completely different concept of WQM used by e.g. llvm.amdgcn.wqm, and it's best not to confuse those two things. For example, someone might think that this intrinsic implicitly enables WQM, when it really doesn't. Maybe llvm.amdgcn.quad.vote would be a better name? |
My understanding is that the existing llvm.amdgcn.wqm() intrinsic annotates "expression trees" for the WQM pass and might not insert any instructions at the call site. This new wqm.vote intrinsic does translate to S_WQM_B{wavesize} directly.
I agree the naming of the intrinsics is a bit unfortunate, but it is what it is, and it's not the end of the world. Please remove speculatable. Other than that, LGTM.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
753 ↗ | (On Diff #118935) | Hmm, agreed that speculatable should probably be removed at least for now. Unfortunately, the nasty Mesa hack is necessary because LLVM IR currently does not have a coconvergent or anticonvergent attribute. The problem is examples like: if (cond) { mask = ballot(other_cond); } else { mask = ballot(other_cond); } // now use readlane intrinsics Changing that to mask = ballot(other_cond) // now use readlane intrinsics is perfectly legal as far as LLVM IR is concerned. Speculatable doesn't play into it, because both branches literally have the same sequence of instructions. |