We handle it this way for some other address spaces.
Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.
Paths
| Differential D68893
AMDGPU: Split flat offsets that don't fit in DAG ClosedPublic Authored by arsenm on Oct 11 2019, 2:55 PM.
Details
Diff Detail Event TimelineHerald added subscribers: jfb, t-tye, tpr and 6 others. · View Herald TranscriptOct 11 2019, 2:55 PM Comment Actions Mostly LGTM, but I wonder about the high level intention here. Is this intended to expose new load/store merging opportunities? If so, is there a test for this? Or is there some part of SIFoldOperands that can now be removed?
Comment Actions
This is mostly covered already, by promote-constOffset-to-imm.ll but I did mean to add more target cases for this. The problem I was solving is that SILoadStoreOptimizer tries to do this optimization currently. When D68894 is applied, this would break since SIFoldOperands would now shrink the add pattern it's looking for. The shrunk form would require vcc liveness tracking, so it's easier to just split the offset earlier
Comment Actions So assume we have several loads from the same base, a normal case with the huge offset. The ideal selection would reuse an already advanced pointer from the previous load. To achieve that an add shall be done to a step increment of the base pointer, say a 2kb increment, or whatever is supported maximum offset. If that is the case then creation if new sdnode will reuse an old one, already advanced. Otherwise it might be better to split even earlier, in the codegen prepare, or separate gep from constant offset to get that ideal result. I doubt a single operation handling will get a good result for a usual sequence of operations. Comment Actions I.e. ideally we want: load p etc for a 128 byte stride. Comment Actions
This would be better, but picking the base constant to use is more difficult. I think this is a next step beyond this patch. I'm not sure splitting this in the IR will work out, as the DAG will try to fold the adds of constants pack together Comment Actions
I think the real problem is treating the base constant as the subtract of the offset that fits. If this was just extracting low bits, in typical usage the base add and constant would end up being the same. I got confused by the signed addressing modes initially, and ended up treating it this way. Next I'll try extract the bits again This revision is now accepted and ready to land.Oct 18 2019, 9:47 PM
Revision Contents
Diff 225689 lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
lib/Target/AMDGPU/SIInstrInfo.h
lib/Target/AMDGPU/SIInstrInfo.cpp
test/CodeGen/AMDGPU/cgp-addressing-modes.ll
test/CodeGen/AMDGPU/flat-address-space.ll
test/CodeGen/AMDGPU/global-saddr.ll
test/CodeGen/AMDGPU/global_atomics.ll
test/CodeGen/AMDGPU/global_atomics_i64.ll
test/CodeGen/AMDGPU/offset-split-flat.ll
test/CodeGen/AMDGPU/offset-split-global.ll
test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
test/CodeGen/AMDGPU/store-hi16.ll
|
It does seem like an annoying duplication of concerns to implement the 64-bit addition here manually.