This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Extend buffer intrinsics with swizzling
ClosedPublic

Authored by piotr on Sep 30 2019, 1:41 AM.

Details

Summary

Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Diff Detail

Repository
rL LLVM

Event Timeline

piotr created this revision.Sep 30 2019, 1:41 AM

I thought this was a property of the resource descriptor? Why do you need to add it to the intrinsic?

include/llvm/IR/IntrinsicsAMDGPU.td
903 ↗(On Diff #222371)

Need to use immarg for all of these

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
522 ↗(On Diff #222371)

Single line comment

arsenm added inline comments.Sep 30 2019, 7:05 AM
include/llvm/IR/IntrinsicsAMDGPU.td
903 ↗(On Diff #222371)

Actually this isn't adding a new argument, but also doesn't have anything to do with the cachepolicy

piotr added a comment.Sep 30 2019, 7:50 AM

I thought this was a property of the resource descriptor? Why do you need to add it to the intrinsic?

Yes, but it is illegal to merge load/store instructions that access swizzled buffers. So far our front-end has been using tbuffer stores in such cases to work around that issue, as tbuffer loads/stores are not being merged in si-load-store-opt. However, since it is often profitable to merge tbuffer loads/stores we want to implement tbuffer load/store merging and selectively prevent merging for instructions that are tagged as operating on a swizzled buffer.

include/llvm/IR/IntrinsicsAMDGPU.td
903 ↗(On Diff #222371)

Yes, the new bit is not really about the cache policy. I should probably rename "cachepolicy" to something more aptly named, or at least rewrite the comment here to make it clear.

The idea behind re-using the cachepolicy operand is to avoid the need to create yet another generation of intrinsics. Also, it makes it backward compatible, the front-ends can enable the "swizzled" bit at their own time.

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
522 ↗(On Diff #222371)

Ah yes, sorry I keep making this error. I will use the C++-style comment and also rename "swizzled" to "Swizzled".

nhaehnle added inline comments.Sep 30 2019, 10:01 AM
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
522 ↗(On Diff #222371)

In the discussion in August there seems to have largely been consensus for moving LLVM towards lowerCamelCase variable naming.

piotr updated this revision to Diff 222779.Oct 2 2019, 2:38 AM

Addressed review comments and rebased.

piotr marked an inline comment as done.Oct 2 2019, 2:40 AM
piotr added inline comments.
lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
522 ↗(On Diff #222371)

Yes, there will be a sweeping change modifying all occurrences at the same time.

nhaehnle accepted this revision.Oct 2 2019, 5:19 AM

LGTM

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
522 ↗(On Diff #222371)

Maybe. Maybe not. The point is, using lowerCamelCase on variables today should be fine. Either way, it's no big deal.

This revision is now accepted and ready to land.Oct 2 2019, 5:19 AM
This revision was automatically updated to reflect the committed changes.
kosarev added inline comments.
llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
201

@piotr Hi Piotr, is this function intentionally empty? I tried to add a printNamedBit() call here and didn't catch any test failures.

Herald added a project: Restricted Project. · View Herald TranscriptJul 3 2023, 11:27 AM
Herald added a subscriber: kerbowa. · View Herald Transcript
piotr added inline comments.Jul 4 2023, 2:02 AM
llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
201

Yes, it is intentionally empty. Th swz bit is information that is used by the compiler to create correct code (whether instruction merging is allowed), but it is not part of the ABI - the instruction would behave identical regardless of swz being present.

kosarev added inline comments.Jul 4 2023, 5:10 AM
llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
201

Thanks. D154432 attempts to clean it up a bit.

piotr added inline comments.Jul 4 2023, 5:15 AM
llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
201

Thanks!

When D68200 was committed the swz was a separate field, so we needed the print function. Since then, we have moved to the cache_policy structure (CPol) containing different fields, so printSWZ function is no longer needed.