Certain address space dependent optimizations, like SeperateConstOffsetFromGEP, assume agreement between the address space of the recursive uses and the address space of the def. If this assumption is invalid, then optimizations may or may not be correct depending on properties of an address space for a given target, the address spaces of recursive uses, and the optimization being done. This patch infers the previous address space for flat_atomic ptr arguments. As a result, the address spaces of the uses in flat_atomic cases will agree with the address space in recursive defs. If this results in non-flat address space, then isel may infer a different intrinsic. For example, if the result is a flat_atomic using global address space, then it will be lowered to the corresponding global_atomic intrinsic.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
What's the failure mode without your patch? Can you precommit the tests?
TBH I don't understand the concept of checking "legality" here. At the IR level I thought all GEPs were legal.
The point of the pass is to form GEPs that are friendly to matching the addressing modes. If an offset doesn't fit the target addressing mode, there's the potential to produce worse codegen
InferAddressSpaces should have taken care of any cases where addrspacecast is involved, so I think you're solving this problem in the wrong place. I forget exactly why we specifically have a flat atomic version of the intrinsic, but it would be better to handle infering that to global pointers there
Hey Matt, Jay,
Thanks for the comments -- as always, they're very helpful and much appreciated.
Jay --
I precommited the test case via e0b16aaaf997. As you can see, we are producing illegal offsets for the flat_atomic_fadd. This is due to SeparateConstOffset pass modifying the GEP s.t. the offset is negative, which gets translated close to the 16bit unsigned max.
I believe all GEPs are technically legal at this stage, however, negative offsets for FLAT addresses are not supported / legal. Thus, if we produce an addrspace(0) address with a negative offset, we will need to handle it at some point or another. The approach here mimics some existing code in the SeparateConstOffset pass by invoking TTI->isLegalAddressingMode and simply disallows producing such an address.
Matt --
Thanks for pointing out that pass -- it does seem like a more appropriate place for this to be handled. I was able to hack together a solution for this using your approach, but I'll need to spend a bit more time to clean things up a bit. The benefit is we will still be able to perform the SeparateConstOffset optimization.
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp | ||
---|---|---|
915–916 ↗ | (On Diff #448433) | Yes probably. If I were to continue with this approach, I would override & use the Analysis/PtrUseVisitor.h class (as it's already doing exactly what I want), which, by default, flags PtrToInts as escaped. |
Rework approach of fix.
Handle propogating the address space in InferAddressSpaces patch.
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | I wouldn't expect this transform to happen. I would expect to emit the flat instruction for the flat atomic despite the address space |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | Not for this PHI test in particular, but for all these tests in which we lower to a global_atomic, right? |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | Yes. I expect the flat atomic intrinsic to give the flat instruction regardless of address space |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | If we know its AS exactly why not to do it? Especially that we are widely using code specialization with AS checking when flat atomic is unavailable. |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | I thought the whole reason we had these address space specific intrinsics in the first place was because of the painfully divergent behaviors in the instructions |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | Added @b-sumner. There is some divergence between DS and VMEM, I do not recall global vs flat within the same GPU. But then I believe these intrinsics exist to use what the target can offer, so mostly because of the divergence between GPUs itself. |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | Agreed. These intrinsics are used to expose HW capabilities when available, and users will be pleased if we can specialize to a known address space. |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
158 | Hi All - thanks for the thoughts and discussion! Hi Matt -- I took a look, and AtomicLoadFAdd SDNodes with AddressSpace(1) pointer operands have ISel patterns to match to either global_atomic or flat_atomic. However, it appears the prioritization / complexity model in FLATInstructions.td favors global intrinsics over flat atomics when both are feasible, which is why we lower to the global intrinsic here. It seems the consensus is to specialize into globals where possible (as is done here)? If so, the concern I have is that is that this behavior does not occur for global-isel (at least not for this test). The node is lowered to flat (despite the address space inference) and we are seeing the illegal offset in generated code. I wonder if address space specialization in global-isel is something that should be addressed in a separate ticket. |
Does anyone have any concerns about this patch?
I believe arsenm is OOO and is not available for review.
Thanks Stas
Landed upstream via https://reviews.llvm.org/rG20cf170e68de
Not sure why arc decided push via a different diff, but closing this one.
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
2 | Don't need -O3 |
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll | ||
---|---|---|
2 | Nice catch, thanks Matt! Fixed it upstream. |
Don't need -O3