This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic
ClosedPublic

Authored by rovka on Jun 26 2023, 6:01 AM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Commits
rG7f5d59b38dc4: [AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (#68186)
Summary

The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call
parameters are bundled up into 2 intrinsic arguments, one for those that
should go in the SGPRs (the 3rd intrinsic argument), and one for those
that should go in the VGPRs (the 4th intrinsic argument). Both will
often be some kind of aggregate.

Both instruction selection frameworks have some internal representation
for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel,
ISD::INTRINSIC_[VOID|WITH_CHAIN] for DAGISel), but we can't use those
because aggregates are dissolved very early on during ISel and we'd lose
the inreg information. Therefore, this patch shortcircuits both the
IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call
from the very start. It tries to use the existing infrastructure as much
as possible, by calling into the code for lowering tail calls.

Diff Detail

Event Timeline

rovka created this revision.Jun 26 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 6:01 AM
rovka requested review of this revision.Jun 26 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 6:01 AM
arsenm added inline comments.Jun 26 2023, 7:40 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

Probably should avoid touching the generic IRTranslator. Can you just handle this as a legalize on the intrinsic itself? Fundamentally this isn't really any different from emitting a libcall, you'd just need to access the CallLowering from the legalize function

rovka added inline comments.Jun 27 2023, 12:31 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

Yeah, I'm not a fan of this either, but the IRTranslator already splits up aggregates and then errors out somewhere along the way. Even if I somehow fix that error, I think it will be more of a maintenance burden to try to hold on to these aggregates, and if we don't keep them together long enough then we won't know which values are meant to go in SGPRs and which in VGPRs. It's clearer and more straightforward to deal with this early on.

But I do agree that this code doesn't look great. I can think of a couple of ways to make it a bit better:

  1. Move the check into translateKnownIntrinsic, which already goes through a lot of intrinsics. At the moment none of them are target-specific, but that might change in the future anyway (SelectionDAG has at least 2 aarch64 intrinsics in similar generic code, I haven't checked if GlobalISel supports them yet), or
  2. Add an IsCall intrinsic attribute and pass intrinsics with that attribute through CallLowering (either lowerCall directly or a new hook, e.g. lowerIntrinsicAsCall)

Do either of those sound more palatable?

rovka updated this revision to Diff 535332.Jun 28 2023, 4:58 AM

Moved check for amdgpu_cs_chain to translateKnownIntrinsics.

arsenm added inline comments.Jun 28 2023, 5:31 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

Why do you need to preserve the aggregates? What's the error? I would expect you don't care about aggregates to lower this, and that aggregates should just work

rovka added inline comments.Jun 28 2023, 6:03 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

I care about the aggregates because otherwise how would I know which values to put in SGPRs and which ones to put in VGPRs? The semantics of the intrinsic are that the 3rd argument contains the stuff that goes into SGPRs, and the 4th contains the stuff that goes into VGPRs. If I allow these to be broken up before legalization (or whenever we decide to actually lower this), then I'll end up with a G_INTRINSIC_WITH_SIDE_EFFECTS with a whole lot of smaller registers, and I won't know where the ones from the first aggregate end and the ones from the second aggregate begin, aka I won't know which ones I should copy to SGPRs and which ones to VGPRs. Does that make sense?

(The actual error is IIRC here, but I don't think that's the most important point; I could fix that to build a merge, like SelectionDAGBuilder does - but even in DAGISel we lose the merge in the first round of combines, and with it the information on what kind of reg each value should be placed in).

arsenm added inline comments.Jun 29 2023, 11:38 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

I would have assumed this is indicated by using inreg on the call parameters. If that were the case, I suppose you would still need to find a way to preserve that information

rovka updated this revision to Diff 536193.Jun 30 2023, 5:29 AM

Set EXEC. This is very smooth for GlobalISel but a bit roundabout for DAGISel (I'm
not sure if there's a better way to handle it, since DAGISel gives control to
the target pretty late on).

Thanks for all the discussion so far. I'm going on vacation so I'll respond to
any other comments after the 17th of July.

foad added a comment.Jun 30 2023, 7:26 AM

Looks pretty good to me overall. I don't know enough about call lowering to offer an opinion on the alternative ways of implementing this that you have been discussing with @arsenm.

The @llvm.amdgcn.cs.chain intrinsic is essentially a call.

It's a tail call, isn't it? Does it have the same restriction as an LLVM IR tail call, that allocas in the caller cannot be accessed by the callee?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7466

Capitalize variable names.

7475

Is this something you could check in the IR verifier? Failing an assertion here is not a nice diagnostic.

7478

Not sure what "it will be handled differently" is telling me?

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
965
1331–1332
1336

This is TRI->getExec()

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
3372–3373

Should not need these lines. We already have access to Subtarget.

3383–3384
3628

TRI->getExec(), if you move the definition of TRI up from 33 lines below.

It's a tail call, isn't it? Does it have the same restriction as an LLVM IR tail call, that allocas in the caller cannot be accessed by the callee?

Yes, that's right.

It would be nice to have a simple test case that shows the final assembly.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7475

Could also make this a report_fatal_error

rovka updated this revision to Diff 544678.Jul 27 2023, 3:13 AM

Address review comments.
I've opted for having more checks in the verifier (D156409).

rovka retitled this revision from [AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (WIP) to [AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic.
rovka edited the summary of this revision. (Show Details)
rovka updated this revision to Diff 549309.Aug 11 2023, 2:31 AM

Rebase.

arsenm added inline comments.Aug 17 2023, 11:49 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

An alternative scheme would be you could just have an immarg parameter that indicates which operand index the VGPRs (start from

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7450

unchecked dyn_cast in assert, should use cast or precheck with isa

rovka added inline comments.Aug 21 2023, 4:09 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2540

That might work, but then splitting the arguments for this intrinsic would go through a different code path than splitting regular function parameters / call arguments, so I would feel less confident about the generated code. Mismatches would not be fun to debug/fix.

rovka updated this revision to Diff 551974.Aug 21 2023, 4:09 AM

Fix cast derpness.

This revision was not accepted when it landed; it landed in state Needs Review.Nov 6 2023, 3:30 AM
This revision was automatically updated to reflect the committed changes.
rovka added a comment.Nov 6 2023, 3:32 AM

The review for this was completed in GitHub: https://github.com/llvm/llvm-project/pull/68186