This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Split flat offsets that don't fit in DAG
ClosedPublic

Authored by arsenm on Oct 11 2019, 2:55 PM.

Details

Reviewers
rampitec
vpykhtin
Summary

We handle it this way for some other address spaces.

Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.

Diff Detail

Event Timeline

arsenm created this revision.Oct 11 2019, 2:55 PM

Mostly LGTM, but I wonder about the high level intention here. Is this intended to expose new load/store merging opportunities? If so, is there a test for this? Or is there some part of SIFoldOperands that can now be removed?

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1685

It does seem like an annoying duplication of concerns to implement the 64-bit addition here manually.

arsenm marked an inline comment as done.Oct 15 2019, 1:11 PM

Mostly LGTM, but I wonder about the high level intention here. Is this intended to expose new load/store merging opportunities? If so, is there a test for this? Or is there some part of SIFoldOperands that can now be removed?

This is mostly covered already, by promote-constOffset-to-imm.ll but I did mean to add more target cases for this. The problem I was solving is that SILoadStoreOptimizer tries to do this optimization currently. When D68894 is applied, this would break since SIFoldOperands would now shrink the add pattern it's looking for. The shrunk form would require vcc liveness tracking, so it's easier to just split the offset earlier

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1685

Because we don't try to change SALU operations to VALU based on a VGPR use, this has some test consequences. Some cases are better with the scalar input but I think the VALU output gave a slightly better result

arsenm updated this revision to Diff 225689.Oct 18 2019, 1:31 PM

Remove dead code. Add new test for different offsets

So assume we have several loads from the same base, a normal case with the huge offset. The ideal selection would reuse an already advanced pointer from the previous load. To achieve that an add shall be done to a step increment of the base pointer, say a 2kb increment, or whatever is supported maximum offset. If that is the case then creation if new sdnode will reuse an old one, already advanced. Otherwise it might be better to split even earlier, in the codegen prepare, or separate gep from constant offset to get that ideal result. I doubt a single operation handling will get a good result for a usual sequence of operations.

I.e. ideally we want:

load p
load p+128
load p+256
...
load p+2048-128
p1 = p + 2048
load p1
load p1 + 128
load p1 + 256
...

etc for a 128 byte stride.

I.e. ideally we want:

load p
load p+128
load p+256
...
load p+2048-128
p1 = p + 2048
load p1
load p1 + 128
load p1 + 256
...

etc for a 128 byte stride.

This would be better, but picking the base constant to use is more difficult. I think this is a next step beyond this patch. I'm not sure splitting this in the IR will work out, as the DAG will try to fold the adds of constants pack together

I.e. ideally we want:

load p
load p+128
load p+256
...
load p+2048-128
p1 = p + 2048
load p1
load p1 + 128
load p1 + 256
...

etc for a 128 byte stride.

This would be better, but picking the base constant to use is more difficult. I think this is a next step beyond this patch. I'm not sure splitting this in the IR will work out, as the DAG will try to fold the adds of constants pack together

I think the real problem is treating the base constant as the subtract of the offset that fits. If this was just extracting low bits, in typical usage the base add and constant would end up being the same. I got confused by the signed addressing modes initially, and ended up treating it this way. Next I'll try extract the bits again

rampitec accepted this revision.Oct 18 2019, 9:47 PM

OK, there is definitely work ahead, but LGTM in the interim.

This revision is now accepted and ready to land.Oct 18 2019, 9:47 PM
arsenm closed this revision.Oct 20 2019, 10:33 AM

r375366