This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets.
ClosedPublic

Authored by kosarev on Jul 4 2022, 11:45 AM.

Details

Summary

Saves some add instructions on a couple Rage 2 shaders and is also a
prerequisite for a coming-soon change matching (register + immediate)
offsets.

Diff Detail

Event Timeline

kosarev created this revision.Jul 4 2022, 11:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2022, 11:45 AM
kosarev requested review of this revision.Jul 4 2022, 11:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2022, 11:45 AM

D128836 covers the case for Global ISel.

foad added inline comments.Jul 5 2022, 1:48 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
36

When this patch and D128836 have both landed, can you add GISEL checks here please.

38

Please don't use unnamed values in tests. (You can run the IR through opt -instnamer to name them automatically.)

40

Do I understand correctly: in SDag this gep ends up as (add %2, @0) (not (add @0, %2)) because @0 is a constant so it gets canonicalized to the RHS?

kosarev updated this revision to Diff 442293.Jul 5 2022, 6:51 AM

Rebased on top of D128171 and updated the test to use named values.

kosarev marked 2 inline comments as done.Jul 5 2022, 6:59 AM
kosarev added inline comments.
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
36

Done; the GCN checks now work for both the SDAG and GISEL runs.

40

That's right, for commutative nodes the combiner tries to make whatever seems a constant be the second operand.

foad accepted this revision.Jul 5 2022, 7:04 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jul 5 2022, 7:04 AM
arsenm added inline comments.Jul 5 2022, 7:05 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

Should use explicit global-isel=0 for dag test

34–35

Should test the operands

45

Doesn't test the 32-bit constant case

kosarev marked an inline comment as done.Jul 5 2022, 7:14 AM
kosarev added inline comments.
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

Why would we want to check that? Looks like there's nothing specific to 32-bit constants in the change?

arsenm added inline comments.Jul 5 2022, 7:15 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

You have the call to Expand32BitAddress

kosarev added inline comments.Jul 5 2022, 7:57 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

Yes, which replicates the existing and already-tested logic. Will it eliminate the need for the test if I move the couples of SelectSMRDOffset() and Expand32BitAddress() calls into a separate function, something like SelectSMRDBaseOffset()? We would need it for matching (register + immediate) offsets anyway.

Feels like that'd be a test case for what is actually not a special case.

arsenm added inline comments.Jul 5 2022, 8:17 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

It's another permutation of a somewhat fragile case (which also works differently in globalisel). There's no reason to skimp on tests

kosarev added inline comments.Jul 6 2022, 6:51 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

The if ((Addr.getValueType() != MVT::i32 || Addr->getFlags().hasNoUnsignedWrap())) bit in AMDGPUDAGToDAGISel::SelectSMRD() means for such a test we would need the nuw flag be set on the ISD::ADD in the offset, which we only do for constant getelementptr inbounds indexes that in turn would always go to the RHS operand of the ISD::ADD, and thus can't help reproducing the case.

D15544 gives some reasoning for raising nuw for non-negative constant offsets. If we think it's worth it, I could try to do the same for non-constants known to be small enough, but it again would help if we do not require cultivating things specific to the constant address space be a prerequisite for this generic change.

kosarev updated this revision to Diff 442591.Jul 6 2022, 8:11 AM

Addressed review notes.

kosarev marked 2 inline comments as done.Jul 6 2022, 8:12 AM
kosarev updated this revision to Diff 443271.Jul 8 2022, 9:43 AM

Update the GISel test for it to be less dependent on code order.

foad added a comment.Jul 11 2022, 2:30 AM

Match complex register SMRD offsets.

Could you improve the commit message? I don't know what "complex" means here. I think it is really about matching (constant base + register offset) in addition to (register base + constant offset).

kosarev updated this revision to Diff 443639.Jul 11 2022, 7:50 AM

Reworded commit message.

kosarev retitled this revision from [AMDGPU][CodeGen] Match complex register SMRD offsets. to [AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets..Jul 11 2022, 7:50 AM
kosarev added a comment.EditedJul 11 2022, 7:55 AM

Right, the 'complex' part came from when I had a hypothesis that it is the size of the operand that matters. Thanks for catching.

foad accepted this revision.Jul 11 2022, 8:39 AM

Still LGTM if Matt is happy.

arsenm added inline comments.Jul 11 2022, 11:21 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

You can just add noinbounds to the getelementptr. It's not a special change and just testing the mechanics for what you already have here

kosarev added inline comments.Jul 11 2022, 12:48 PM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

Sorry, I'm not sure I follow. What is noinbounds and how it should be used with the getelementptr, exactly?

kosarev added inline comments.Jul 13 2022, 11:00 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

@arsenm Ping?

arsenm added inline comments.Jul 13 2022, 11:22 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

You can try this:

%i3 = getelementptr inbounds [4 x <2 x float>],

with the addrspace(6) pointer

kosarev added inline comments.Jul 14 2022, 4:35 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

It has already been explained in https://reviews.llvm.org/D129095#inline-1241075 that that wouldn't work.

On top of that, we don't seem to support addrspace(6) constants in SDAG ISel currently; I get AMDGPU DAG->DAG Pattern Instruction Selection error whenever I try to use one.

Regarding noinbounds, I don't see how your words explain what it is or how it can be useful for triggering the added code. Is that your comment still relevant?

And just for the record, I don't understand the argument about the new Expand32BitAddress() call being a sufficient reason for another test case. We usually don't add tests for higher level code just because it employs more of some lower level functions. It's up to tests specifically for these lower level functions to make sure they can handle their intended uses cases, and as hopefully is obvious from https://reviews.llvm.org/D129381 that eliminates multiple calls to Expand32BitAddress(), this patch doesn't add any new use cases for that function.

What has been changed is that we now support another permutation of operands in SelectSMRD(), but that is already covered with the added tests. addrspace(6) operands are no special in this regard, so adding more tests for them would mean we try to test combinations that are not special implementation-wise, which is again not how we write regression tests. And yes, unnecessary tests can actually be harmful -- they draw maintenance resources and attention and they hide cases that are really important. I believe some of our MC tests illustrate that.

Overall, I feel somewhat confused about your two last comments. If you can reword what you have to suggest in a bit more explicit way, that will likely be a great help. Thanks.

kosarev updated this revision to Diff 444590.Jul 14 2022, 4:40 AM

Updated a comment in the .ll test to not mention complex operands.

kosarev requested review of this revision.Jul 15 2022, 1:43 AM
arsenm accepted this revision.Jul 16 2022, 8:31 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
45

The point isn't to test Expand32BitAddress but to test that it's used on this path. I guess if constants don't work for constant 32-bit that's a separate issue

This revision is now accepted and ready to land.Jul 16 2022, 8:31 AM
This revision was landed with ongoing or failed builds.Jul 18 2022, 3:19 AM
This revision was automatically updated to reflect the committed changes.