This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
ClosedPublic

Authored by kosarev on Jul 21 2022, 6:07 AM.

Diff Detail

Event Timeline

kosarev created this revision.Jul 21 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 6:07 AM
kosarev requested review of this revision.Jul 21 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 6:07 AM
kosarev added inline comments.Jul 21 2022, 6:15 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

I'm not perfectly sure we do the right here. One concern is that m_ICst() seems to match signed values and another is that we ignore nuw/nsw flags. I tried to add a check for nuw, and that led to numerous test failures, which coupled with that we seem to use this code for non-scalar buffer loads suggests that it's probably a task of its own. If anyone can confirm that these concerns are valid, I'd like to add a TODO explaining them to this patch.

This affects 53 of our test shaders, reducing code size of each by ~100 bytes in average. No shaders with increased code size.

arsenm added inline comments.Jul 21 2022, 6:19 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

m_ICst should just match constants, the sign is in the interpretation.

nuw/nsw only potentially matters in cases where we're extending a 32-bit offset to 64-bits in one of the offsets that don't handle wrapping. I expect to match the DAG behavior for all of these addressing modes

I don't think you should be trying to match constant offsets in either operand here. This is showing we're missing the standard canonicalization to move constants to the RHS for gMIR

kosarev updated this revision to Diff 447715.Jul 26 2022, 8:06 AM

Rework matching offsets.

kosarev added inline comments.Jul 26 2022, 8:08 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

It seems the current approach is to rely on matchers that for commutative operations try both the cases. Updated to use them.

kosarev marked an inline comment as done.Jul 29 2022, 12:51 AM

Ping.

foad added inline comments.Aug 2 2022, 6:29 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

Does anything go wrong if you remove the Offset == 0 check?

arsenm added inline comments.Aug 2 2022, 7:44 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

It's not a current approach, it's a missing feature. I'd rather not handle the commuted case and separately introduce a new combine to canonicalize constants to the RHS

kosarev added inline comments.Aug 2 2022, 9:02 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

I don't mind to look into why we currently do it the way we do, but now that this patch doesn't do anything special to match both the cases, it seems it can be addressed separately and after this is landed?

(And as I side note, I admit I struggle to see how this is not a current approach, considering that this is literally what's being done whenever we aim to match commutative nodes.)

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

The tests still pass (and that is what I would expect from reading relevant code), but we don't have many of them and GISel doesn't seem to be able to cope with the most of our test corpus shaders. Is there any benefit in removing the check?

foad added inline comments.Aug 2 2022, 9:22 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

Is there any benefit in removing the check?

I have not fully understood the patch but it seems like it is more complicated than necessary because you insist that this offset has to be non-zero. For example I assume this is why you had to move the Offset && !SOffset case out of SelectSMRDBaseOffset and into SelectSMRD(?). So I am trying to understand where the requirement comes from. Is it because you want to be sure that if the offset is 0 then we select _SGPR instead of _SGPR_IMM forms of the instruction? If so, then I think this should be handled by ensuring that the patterns for the _SGPR forms have higher priority. (Perhaps this already happens, just because of the order the patterns appear in the .td file?)

kosarev updated this revision to Diff 449622.Aug 3 2022, 3:26 AM

Removed the Offset == 0 check in the GISel matcher.

kosarev added inline comments.Aug 3 2022, 3:47 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

I see. Yes, this makes sense. I've removed the check for the sake of having a chance to catch it if and when we messed up with the matching order.

As of right now the ordering looks correct to me as the SGPR_IMM pattern is already the most complex one and thus I understand doesn't need playing with AddedComplexity.

Regarding the Offset && !SOffset case, since we now use SelectSMRDBaseOffset() to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function.

foad added inline comments.Aug 3 2022, 7:21 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

Regarding the Offset && !SOffset case, since we now use SelectSMRDBaseOffset() to match the SGPR_IMM form and for that form we don't want the invented zero offset to match the IMM part, that code has to be somewhere outside of this function.

I still don't understand. If the _SGPR_IMM form exists then it is in no way inferior to the _SGPR form, so why not use it? (To put it another way, the _SGPR form is redundant on subtargets that have the _SGPR_IMM form - perhaps we should even use predicates to disable it on those subtargets.)

kosarev added inline comments.Aug 3 2022, 9:13 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

OK, that's another matter. One sort of reasons I can think of is to make the resulting code look more natural and to make life a bit easier reading and preparing tests, etc? So no obvious reasons not to use SGPR either, it seems?

Trying to see a bigger picture, looks like in a better world we would have the same instruction with the non-zero offset part being syntactically optional and only available on certain subtargets. But where we have different instructions doing the same thing we normally use the one that looks least tricky and unexpected in the context?

arsenm added inline comments.Aug 4 2022, 7:30 PM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

I don't know why I thought this was checking both operands; I see now this is looking through a copy for a constant. This is still also a pattern that should not reach the selector, and if it does it's demonstrating a combiner failure. The common case where we end up with a copy of a constant is for a VGPR use, which you don't have here so I don't think you need to worry about the copy here

kosarev added inline comments.Aug 8 2022, 4:07 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

This patch doesn't add the copy matching either; just rewrites it to use mi_match(). Whether it's a combiner failure or a matcher issue as your FIXME above says, it doesn't seem to have anything to do with this patch?

Removing the copy-matching bit leads to some test failures, so looks like it's there for a reason.

Failed Tests (5):
  LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.gws.barrier.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ds.gws.init.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.s.buffer.load.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir
kosarev added inline comments.Aug 12 2022, 4:29 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

Ping.

foad added a subscriber: qcolombet.Sep 1 2022, 3:13 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

There is some philosophical discussion about this here: https://discourse.llvm.org/t/globalisel-and-commutative-binops-in-pat/58115
in which @qcolombet says "we don’t do a whole lot because the incoming IR is already supposed to be canonical".
So in the interest of making forward progress, perhaps we should drop this hunk from the patch, and instead fix the IR in test_buffer_load_sgpr_plus_imm_offset to be canonical by having the 77 on the RHS of the add. This is what opt -instcombine would do to it.

foad added inline comments.Sep 1 2022, 3:48 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1886–1887

Would you mind updating this comment since there is no Imm argument.

1973–1974

Would you mind updating this comment since there is no Imm argument.

1990

Committing most of this hunk as an NFC refactoring would have made the current patch easier to read.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4922

It feels like the getType check could be an assertion? I can't see how it would fail, if the tablegen patterns are written correctly.

kosarev updated this revision to Diff 457265.EditedSep 1 2022, 7:13 AM

Addressed review feedback and rebased.

kosarev marked 3 inline comments as done.Sep 1 2022, 7:27 AM
kosarev added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
40 ↗(On Diff #446462)

What a useful finding is that discussion. Thanks for pointing out. Updated as suggested.

foad accepted this revision.Sep 1 2022, 7:40 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 1 2022, 7:40 AM

I'm going to commit this next Monday, if no objections. Should there be any further feedback on this as people get back to office, hopefully we'll be able to address it post-commit. Thanks for reviewing.