This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
ClosedPublic

Authored by mareko on Oct 4 2017, 8:05 AM.

Details

Summary

Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Diff Detail

Repository
rL LLVM

Event Timeline

mareko created this revision.Oct 4 2017, 8:05 AM
arsenm added inline comments.Oct 4 2017, 10:24 AM
include/llvm/IR/IntrinsicsAMDGPU.td
757 ↗(On Diff #117674)

This should be convergent (and no mem?)

lib/Target/AMDGPU/AMDGPUInstructions.td
170–175 ↗(On Diff #117674)

I'm not sure what this means by NONANS. I think this is just doing the same thing as the existing COND_O* patleafs by accepting the ordered and unspecified compares as ordered.

lib/Target/AMDGPU/SIISelLowering.cpp
3002 ↗(On Diff #117674)

I don't think we should have a family of different kill opcodes just for the different compare types. Can we just have SI_KILL with the condition register input? We could then have an optimization pass look for the
%cond = V_CMP_*
SI_KILL %cond -> V_CMPX_*_term pattern. We would probably have to introduce the new _term variants of the CMPX instructions.

lib/Target/AMDGPU/SIInsertSkips.cpp
204–206 ↗(On Diff #117674)

If the condition is an SGPR this will violate the operand constraints, so this should be creating the _e64 version. The problem with that is this pass runs after SIShrinkInstructions, so this won't be optimized down in the common case which is part of why this expansion should probably be done earlier.

lib/Transforms/InstCombine/InstCombineCalls.cpp
3543–3550 ↗(On Diff #117674)

Test missing for this part

test/CodeGen/AMDGPU/kill.ll
1 ↗(On Diff #117674)

This should just run llc. The instcombine test should be separate in test/Transforms/InstCombine/AMDGPU

Also should use -enable-var-scope for FileCheck

23 ↗(On Diff #117674)

Delete unused arguments (opt -deadarghaX0r should be able to do this for you)

26 ↗(On Diff #117674)

It would be better to put the new intrinsic tests in a new llvm.amdgcn.kill file, and leave the legacy versions in a separate test file

mareko added inline comments.Oct 4 2017, 3:15 PM
include/llvm/IR/IntrinsicsAMDGPU.td
757 ↗(On Diff #117674)

amdgcn.kill shouldn't be moved across ds.bpermute, ds.swizzle, and image_sample opcodes. Does IntrNoMem or IntrConvergent assure that?

lib/Target/AMDGPU/AMDGPUInstructions.td
170–175 ↗(On Diff #117674)

It means I'm lazy to handle OGE and !OGE separately. I'd still like to handle !OGE in a simple way and not care about NaNs. Alternatively, I can use add 2 patterns, one for COND_OGE and one for COND_UGE, both mapping to SI_KILL_F32_GE_0).

lib/Target/AMDGPU/SIISelLowering.cpp
3002 ↗(On Diff #117674)

I tried to do that but it's too much work. Eventually we'd like all flavors of V_CMPX, but it's not a high priority right now.

lib/Target/AMDGPU/SIInsertSkips.cpp
204–206 ↗(On Diff #117674)

This is not new code. It's the previous code moved by a few lines.

lib/Transforms/InstCombine/InstCombineCalls.cpp
3543–3550 ↗(On Diff #117674)

It's the "kill_true" test.

test/CodeGen/AMDGPU/kill.ll
1 ↗(On Diff #117674)

What's the deal with not running -instcombine as part of AMDGCN tests? It seems like it would be convenient everywhere.

nhaehnle edited edge metadata.Oct 10 2017, 3:17 AM

A bunch of inline comments, but also some higher level things that don't really fit anywhere. This is a useful feature, but I don't think we've ever gotten the design of kill just right, because kill is really an implicit control flow intrinsic.

So, for example, if you have

%v = llvm.amdgcn.icmp(...) ; ballot-type instruction
kill(...)
use %v

then LLVM is free to move the ballot-type instruction to after the kill according to the LLVM IR semantics, even though that is incorrect.

This isn't a problem in practice yet, because the instructions most likely to be affected by this are image sample intrinsics. Those are IntrReadMem, and kill itself has arbitrary side effects today, so sample intrinsics cannot be moved past a kill. Still, it might lead to problems in the future with shaders that use ballot and DPP / reduction intrinsics. So I've been wondering if we couldn't perhaps use kill like this:

  br i1 %cond, label %kill_bb, label %cont
kill_bb:
  call noret void @llvm.amdgcn.kill()
  ret undef
cont:
  ...

or perhaps better:

  %kill = call i1 @llvm.amdgcn.ps.kill(%cond)
  br i1 %kill, label %kill_bb, label %cont
kill_bb:
  call noret void @llvm.amdgcn.kill()
  ret undef
cont:
  ...

In this second variant, the ps.kill intrinsic would update the live mask and return true for all threads that can exit, i.e. ps.kill would internally do the WQM vote.

The advantage is that the control flow aspect of kill is properly modeled at LLVM IR level and so we can't run into issues with convergent intrinsics moving past it. I'd feel much more comfortable with an approach like that.

include/llvm/IR/IntrinsicsAMDGPU.td
757 ↗(On Diff #117674)

If we do stick with the simple intrinsic-based approach, I think we should keep the attributes as they are right now, i.e. keep them identical to AMDGPU.kill. That will give us fewer surprises...

lib/Target/AMDGPU/SIISelLowering.cpp
3002 ↗(On Diff #117674)

I think Matt is right. It would be cleaner and give us the benefit of optimizing more cases.

test/CodeGen/AMDGPU/kill.ll
1 ↗(On Diff #117674)

Fewer moving parts, I think. If instcombine is run on the test as well, there's a higher chance of tests being "broken" (in a false positive way) by random unrelated changes.

mareko updated this revision to Diff 118954.Oct 13 2017, 12:22 PM

Address feedback.

Nicolai, that's a good point, though let's just merge this intrinsic replacement for now.

nhaehnle accepted this revision.Oct 23 2017, 2:59 AM

You're right, on second thought the existing intrinsic has the same problem. So this is still a strict improvement.

This revision is now accepted and ready to land.Oct 23 2017, 2:59 AM
This revision was automatically updated to reflect the committed changes.