This is an archive of the discontinued LLVM Phabricator instance.

[InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics
ClosedPublic

Authored by jrbyrnes on Jul 28 2022, 1:52 PM.

Details

Summary
Certain address space dependent optimizations, like SeperateConstOffsetFromGEP, assume agreement between the address space of the recursive uses and the address space of the def. If this assumption is invalid, then optimizations may or may not be correct depending on properties of an address space for a given target, the address spaces of recursive uses, and the optimization being done.

This patch infers the previous address space for flat_atomic ptr arguments. As a result, the address spaces of the uses in flat_atomic cases will agree with the address space in recursive defs. If this results in non-flat address space, then isel may infer a different intrinsic. For example, if the result is a flat_atomic using global address space, then it will be lowered to the corresponding global_atomic intrinsic.

Diff Detail

Event Timeline

jrbyrnes created this revision.Jul 28 2022, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 1:52 PM
jrbyrnes requested review of this revision.Jul 28 2022, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 1:52 PM
jrbyrnes updated this revision to Diff 448433.Jul 28 2022, 1:53 PM

Remove unnecessary local var

foad added a comment.Aug 3 2022, 1:49 AM

What's the failure mode without your patch? Can you precommit the tests?

TBH I don't understand the concept of checking "legality" here. At the IR level I thought all GEPs were legal.

arsenm added a comment.Aug 3 2022, 7:27 AM

What's the failure mode without your patch? Can you precommit the tests?

TBH I don't understand the concept of checking "legality" here. At the IR level I thought all GEPs were legal.

The point of the pass is to form GEPs that are friendly to matching the addressing modes. If an offset doesn't fit the target addressing mode, there's the potential to produce worse codegen

arsenm added a comment.Aug 3 2022, 7:29 AM

InferAddressSpaces should have taken care of any cases where addrspacecast is involved, so I think you're solving this problem in the wrong place. I forget exactly why we specifically have a flat atomic version of the intrinsic, but it would be better to handle infering that to global pointers there

arsenm added inline comments.Aug 3 2022, 7:35 AM
llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
915–916 ↗(On Diff #448433)

Doing anything to ptrtoint/inttoptr is probably wrong

llvm/test/CodeGen/AMDGPU/gep-const-address-space.ll
254–255

Should compact the attribute group numbers

jrbyrnes marked an inline comment as done.Aug 3 2022, 3:35 PM

Hey Matt, Jay,

Thanks for the comments -- as always, they're very helpful and much appreciated.

Jay --

I precommited the test case via e0b16aaaf997. As you can see, we are producing illegal offsets for the flat_atomic_fadd. This is due to SeparateConstOffset pass modifying the GEP s.t. the offset is negative, which gets translated close to the 16bit unsigned max.

I believe all GEPs are technically legal at this stage, however, negative offsets for FLAT addresses are not supported / legal. Thus, if we produce an addrspace(0) address with a negative offset, we will need to handle it at some point or another. The approach here mimics some existing code in the SeparateConstOffset pass by invoking TTI->isLegalAddressingMode and simply disallows producing such an address.

Matt --

Thanks for pointing out that pass -- it does seem like a more appropriate place for this to be handled. I was able to hack together a solution for this using your approach, but I'll need to spend a bit more time to clean things up a bit. The benefit is we will still be able to perform the SeparateConstOffset optimization.

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
915–916 ↗(On Diff #448433)

Yes probably. If I were to continue with this approach, I would override & use the Analysis/PtrUseVisitor.h class (as it's already doing exactly what I want), which, by default, flags PtrToInts as escaped.

jrbyrnes updated this revision to Diff 450099.Aug 4 2022, 12:17 PM

Rework approach of fix.

Handle propogating the address space in InferAddressSpaces patch.

jrbyrnes retitled this revision from [SeparateConstOffsetFromGEP] [AMDGPU] Check legality for all uses of transformed GEP to [InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics.Aug 4 2022, 12:19 PM
jrbyrnes edited the summary of this revision. (Show Details)
arsenm added inline comments.Aug 4 2022, 12:21 PM
llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll
2

This should only run the IR infer address spaces. CodeGen tests belong in test/CodeGen/AMDGPU

182

target-cpu attribute is redundant with the command line

jrbyrnes updated this revision to Diff 450117.Aug 4 2022, 1:21 PM
jrbyrnes marked 2 inline comments as done.
jrbyrnes edited the summary of this revision. (Show Details)

Move codegen tests to CodeGen, add IR test for InferAddressSpace flat_atomic.

arsenm added inline comments.Aug 4 2022, 1:26 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

I wouldn't expect this transform to happen. I would expect to emit the flat instruction for the flat atomic despite the address space

jrbyrnes added inline comments.Aug 4 2022, 1:38 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

Not for this PHI test in particular, but for all these tests in which we lower to a global_atomic, right?

arsenm added inline comments.Aug 4 2022, 1:40 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

Yes. I expect the flat atomic intrinsic to give the flat instruction regardless of address space

rampitec added inline comments.Aug 4 2022, 1:41 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

If we know its AS exactly why not to do it? Especially that we are widely using code specialization with AS checking when flat atomic is unavailable.

arsenm added inline comments.Aug 4 2022, 1:46 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

I thought the whole reason we had these address space specific intrinsics in the first place was because of the painfully divergent behaviors in the instructions

rampitec added inline comments.Aug 4 2022, 1:51 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

Added @b-sumner. There is some divergence between DS and VMEM, I do not recall global vs flat within the same GPU. But then I believe these intrinsics exist to use what the target can offer, so mostly because of the divergence between GPUs itself.

b-sumner added inline comments.Aug 4 2022, 2:00 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

Agreed. These intrinsics are used to expose HW capabilities when available, and users will be pleased if we can specialize to a known address space.

jrbyrnes added inline comments.Aug 8 2022, 5:47 PM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
158 ↗(On Diff #450117)

Hi All - thanks for the thoughts and discussion!

Hi Matt -- I took a look, and AtomicLoadFAdd SDNodes with AddressSpace(1) pointer operands have ISel patterns to match to either global_atomic or flat_atomic. However, it appears the prioritization / complexity model in FLATInstructions.td favors global intrinsics over flat atomics when both are feasible, which is why we lower to the global intrinsic here.

It seems the consensus is to specialize into globals where possible (as is done here)?

If so, the concern I have is that is that this behavior does not occur for global-isel (at least not for this test). The node is lowered to flat (despite the address space inference) and we are seeing the illegal offset in generated code. I wonder if address space specialization in global-isel is something that should be addressed in a separate ticket.

arsenm added a comment.Aug 8 2022, 5:48 PM

The globalisel behavior should be consistent, but is a separate issue

Does anyone have any concerns about this patch?

I believe arsenm is OOO and is not available for review.

This revision is now accepted and ready to land.Aug 18 2022, 10:12 AM
jrbyrnes closed this revision.Aug 19 2022, 3:48 PM

Thanks Stas

Landed upstream via https://reviews.llvm.org/rG20cf170e68de

Not sure why arc decided push via a different diff, but closing this one.

arsenm added inline comments.Sep 15 2022, 10:23 AM
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
2 ↗(On Diff #450117)

Don't need -O3

jrbyrnes marked an inline comment as done.Sep 15 2022, 3:39 PM
jrbyrnes added inline comments.
llvm/test/CodeGen/AMDGPU/gep-const-offset-address-space.ll
2 ↗(On Diff #450117)

Nice catch, thanks Matt! Fixed it upstream.