Page MenuHomePhabricator

[AMDGPU] Always use IDX for load/store format intrinsics.
AcceptedPublic

Authored by bnieuwenhuizen on Mar 12 2018, 1:30 PM.

Details

Summary

On Vega, whether the buffer size in the descriptor is interpreted
in bytes or in units of stride depends on whether IDXEN is enabled.

This modifies the format intrinsics to always use IDXEN for GFX9,
so that we can always rely on it being units of stride. Before this
the intrinsics were pretty much unusable as we cannot fill in the
size field without knowing if LLVM will apply the optimization.

I heard people on IRC that they preferred modifying the existing
intrinsic. If you want me to create new ones for the pattern, that
is OK with me.

Diff Detail

Event Timeline

bnieuwenhuizen created this revision.Mar 12 2018, 1:30 PM

We need tests for the store case as well.

I think it's fine to keep the existing intrinsics, but this causes a subtle change (or clarification) of semantics. Could you please add a comment in IntrinsicsAMDGPU.td explaining that buffer.{load,store}.format.* expects a buffer with stride != 0, while the non-format intrinsics expect a buffer with stride == 0?

Added a run fox GFX9 in the store tests and added the comments to the
intrinsics.

I think the unit change is a hint we should use different intrinsics for this

I think the unit change is a hint we should use different intrinsics for this

The current intrinsic is unusable on GFX9 though. Why should we keep it as-is if it's unusable? This patch is reasonable because it picks the most expected behavior for GFX9 while it disallows the least expected and problematic behavior.

nhaehnle accepted this revision.Mar 17 2018, 5:33 AM

I think the unit change is a hint we should use different intrinsics for this

The current intrinsic is unusable on GFX9 though. Why should we keep it as-is if it's unusable? This patch is reasonable because it picks the most expected behavior for GFX9 while it disallows the least expected and problematic behavior.

I agree. We don't have a use for "format" instructions without stride, and we don't have a use for "non-format" instructions with stride, so this change is pragmatic. Arguably, we should remove the "index" parameter from the "non-format" intrinsics, but that's a change requiring a flag-day.

This revision is now accepted and ready to land.Mar 17 2018, 5:33 AM
tpr added a comment.Mar 27 2018, 1:04 AM

I don't have any objection to this change going in, but I think we need more:

  1. LLPC generates indexed atomic ops. This change does not fix that. We have a case where LLVM optimizes an index that is constant 0 to idxen=0, causing a breakage. But probably not all atomic ops are intended to be indexed.
  1. This change doesn't do anything to tbuffer ops.
  1. Longer term, surely it makes sense to represent the full semantic capability of the ISA by having non-indexed and indexed (raw and structured) variants of all of the intrinsics. That could be done compatibly after this change by assuming the formatted buffer ops are structured and introducing raw variants, and assuming the other buffer ops are raw and introducing structured variants.
tpr added a comment.Mar 27 2018, 3:05 AM

Also (in terms of extra changes needed in buffer intrinsics over and above this change):

David Stuttard pointed me at
https://reviews.llvm.org/D30687
in which Tom Stellard pointed out that the way the buffer intrinsics conflate the three offsets (inst_offset, sgpr_offset and vgpr_offset) breaks swizzling, and possibly range checking in swizzling too.

So the suggestion is to have new intrinsics with the three offsets in separate args, in the same way as David's tbuffer intrinsic. Then the new intrinsics would also need to have raw and structured variants.

Indexed atomic ops are a good point. radeonsi and radv should be generating those as well for image atomics on texture buffers.

I doubt we'll ever use *format intrinsics with non-strided buffers, so we will be able to leave those as they are with this change, but for the other buffer intrinsics it looks like we indeed have to go to two sets of intrinsics, with and without stride. But that could be done in a separate change.