This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use resource base for buffer instruction MachineMemOperands
ClosedPublic

Authored by krzysz00 on Apr 12 2023, 4:31 PM.

Details

Summary
  1. Remove the existing code that would encode the constant offsets (if

there were any) on buffer intrinsic operations onto their
MachineMemOperands. As far as I can tell, this use of offset has
no substantial impact on the generated code, especially since the same
reasoning is performed by areMemAccessesTriviallyDisjoint().

  1. When a buffer resource intrinsic takes a pointer argument as the

base resource/descriptor, place that memory argument in the value
field of the MachineMemOperand attached to that intrinsic.

This is more conservative than what would be produced by more typical
LLVM code using GEP, as the Value (for alias analysis purposes)
corresponding to accessing buffer[0] and buffer[1] is the same.
However, the target-specific analysis of disjoint offsets covers a lot
of the simple usecases.

Despite this limitation, the new buffer intrinsics, combined with
LLVM's existing pointer annotations, allow for non-trivial
optimizations, as seen in the new tests, where marking two buffer
descriptors "noalias" allows merging together loads and stores in a
"load from A, modify loaded value, store to B" sequence, which would
not be possible previously.

Depends on D147547

Diff Detail

Event Timeline

krzysz00 created this revision.Apr 12 2023, 4:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 4:31 PM
krzysz00 requested review of this revision.Apr 12 2023, 4:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 4:31 PM

I'll note that the only non-trivial test change I'm aware of is @test_alloca over in CodeGen/AMDGPU/wqm.ll, where the better aliasing semantics allowed some of the writes to the alloca to be sunk towards the bottom of the function, which led to a wqm invocation being removed

krzysz00 updated this revision to Diff 516970.Apr 25 2023, 4:09 PM

Rebase, remove dead line

I thought the point of these offsets were to provide a quick way to avoid going through potentially expensive AA queries in the common case where you're comparing common offsets from a base address. You wouldn't expect to see codegen differences, but it would be slower

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1029

Perhaps should be wrapped by a isBufferResource helper. I'm worried about introducing more buffer address spaces in the future

From what I worked out looking through the code, we were using offset somewhat incorrectly here, and it's meant for handling memory operations that were split in the backend.

That is, if I have

%x = load i128, ptr %a
%b = getelementptr i128, ptr %a, i32 1
%y = load i128, ptr %b

those'll become - and stay (load (s128) from %ir.a) and (load (s128) from %ir.b)

But, if, as the backend, I can't do a 128-bit load and need to do them as a pair of 64-bit loads, I'll construct the memory operands (load (s64) from %.ir.a), (load (s64) from %.ir.a + 8), (load (s64) from %.ir.b), and (load (s64) from %.ir.b + 8) to represent how I've chopped up the load up and that those two loads from the same IR value don't actually alias

arsenm accepted this revision.May 1 2023, 1:53 PM
This revision is now accepted and ready to land.May 1 2023, 1:53 PM
krzysz00 updated this revision to Diff 519310.May 3 2023, 5:12 PM

Apply "from %.ir.*" to the legalization tests added in other patches

chapuni added a subscriber: chapuni.Jun 5 2023, 5:43 PM

Sorry, I did a wrong fix. Does anyone fix it?