Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
757 ↗ | (On Diff #117674) | This should be convergent (and no mem?) |
lib/Target/AMDGPU/AMDGPUInstructions.td | ||
170–175 ↗ | (On Diff #117674) | I'm not sure what this means by NONANS. I think this is just doing the same thing as the existing COND_O* patleafs by accepting the ordered and unspecified compares as ordered. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
3002 ↗ | (On Diff #117674) | I don't think we should have a family of different kill opcodes just for the different compare types. Can we just have SI_KILL with the condition register input? We could then have an optimization pass look for the |
lib/Target/AMDGPU/SIInsertSkips.cpp | ||
204–206 ↗ | (On Diff #117674) | If the condition is an SGPR this will violate the operand constraints, so this should be creating the _e64 version. The problem with that is this pass runs after SIShrinkInstructions, so this won't be optimized down in the common case which is part of why this expansion should probably be done earlier. |
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
3543–3550 ↗ | (On Diff #117674) | Test missing for this part |
test/CodeGen/AMDGPU/kill.ll | ||
1 ↗ | (On Diff #117674) | This should just run llc. The instcombine test should be separate in test/Transforms/InstCombine/AMDGPU Also should use -enable-var-scope for FileCheck |
23 ↗ | (On Diff #117674) | Delete unused arguments (opt -deadarghaX0r should be able to do this for you) |
26 ↗ | (On Diff #117674) | It would be better to put the new intrinsic tests in a new llvm.amdgcn.kill file, and leave the legacy versions in a separate test file |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
757 ↗ | (On Diff #117674) | amdgcn.kill shouldn't be moved across ds.bpermute, ds.swizzle, and image_sample opcodes. Does IntrNoMem or IntrConvergent assure that? |
lib/Target/AMDGPU/AMDGPUInstructions.td | ||
170–175 ↗ | (On Diff #117674) | It means I'm lazy to handle OGE and !OGE separately. I'd still like to handle !OGE in a simple way and not care about NaNs. Alternatively, I can use add 2 patterns, one for COND_OGE and one for COND_UGE, both mapping to SI_KILL_F32_GE_0). |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
3002 ↗ | (On Diff #117674) | I tried to do that but it's too much work. Eventually we'd like all flavors of V_CMPX, but it's not a high priority right now. |
lib/Target/AMDGPU/SIInsertSkips.cpp | ||
204–206 ↗ | (On Diff #117674) | This is not new code. It's the previous code moved by a few lines. |
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
3543–3550 ↗ | (On Diff #117674) | It's the "kill_true" test. |
test/CodeGen/AMDGPU/kill.ll | ||
1 ↗ | (On Diff #117674) | What's the deal with not running -instcombine as part of AMDGCN tests? It seems like it would be convenient everywhere. |
A bunch of inline comments, but also some higher level things that don't really fit anywhere. This is a useful feature, but I don't think we've ever gotten the design of kill just right, because kill is really an implicit control flow intrinsic.
So, for example, if you have
%v = llvm.amdgcn.icmp(...) ; ballot-type instruction kill(...) use %v
then LLVM is free to move the ballot-type instruction to after the kill according to the LLVM IR semantics, even though that is incorrect.
This isn't a problem in practice yet, because the instructions most likely to be affected by this are image sample intrinsics. Those are IntrReadMem, and kill itself has arbitrary side effects today, so sample intrinsics cannot be moved past a kill. Still, it might lead to problems in the future with shaders that use ballot and DPP / reduction intrinsics. So I've been wondering if we couldn't perhaps use kill like this:
br i1 %cond, label %kill_bb, label %cont kill_bb: call noret void @llvm.amdgcn.kill() ret undef cont: ...
or perhaps better:
%kill = call i1 @llvm.amdgcn.ps.kill(%cond) br i1 %kill, label %kill_bb, label %cont kill_bb: call noret void @llvm.amdgcn.kill() ret undef cont: ...
In this second variant, the ps.kill intrinsic would update the live mask and return true for all threads that can exit, i.e. ps.kill would internally do the WQM vote.
The advantage is that the control flow aspect of kill is properly modeled at LLVM IR level and so we can't run into issues with convergent intrinsics moving past it. I'd feel much more comfortable with an approach like that.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
757 ↗ | (On Diff #117674) | If we do stick with the simple intrinsic-based approach, I think we should keep the attributes as they are right now, i.e. keep them identical to AMDGPU.kill. That will give us fewer surprises... |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
3002 ↗ | (On Diff #117674) | I think Matt is right. It would be cleaner and give us the benefit of optimizing more cases. |
test/CodeGen/AMDGPU/kill.ll | ||
1 ↗ | (On Diff #117674) | Fewer moving parts, I think. If instcombine is run on the test as well, there's a higher chance of tests being "broken" (in a false positive way) by random unrelated changes. |
Nicolai, that's a good point, though let's just merge this intrinsic replacement for now.
You're right, on second thought the existing intrinsic has the same problem. So this is still a strict improvement.