This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
ClosedPublic

Authored by nhaehnle on Feb 15 2016, 2:05 PM.

Details

Summary

They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

The situation with the intrinsic attributes for buffer.store.format is
unfortunate. Specifying only IntrReadWriteArgMem will eventually be correct,
but right now LLVM doesn't understand buffer resources as pointers, so might
possibly miscompile something in an optimization pass. On the other hand,
not specifying any attributes at all (which is conservative and correct)
clashes in TableGen pattern generation with the fact that BUFFER_STORE_*
instruction have hasSideEffects = 0 (quite reasonably), and I'm hesitant to
change that since those instructions are already used by compute.

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle updated this revision to Diff 48013.Feb 15 2016, 2:05 PM
nhaehnle retitled this revision from to AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
tstellarAMD accepted this revision.Feb 16 2016, 3:57 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Feb 16 2016, 3:57 PM
arsenm added inline comments.Feb 16 2016, 4:34 PM
include/llvm/IR/IntrinsicsAMDGPU.td
183–190 ↗(On Diff #48013)

I don't think it makes sense to directly expose all fields. TFE for example changes the register class of the result. The address fields also seem like they shouldn't be directly exposed and should use normal looking addressing mode matching.

204 ↗(On Diff #48013)

I think this will break because the intrinsic doesn't have any pointer arguments

nhaehnle updated this revision to Diff 48371.Feb 18 2016, 11:11 AM
nhaehnle edited edge metadata.

Removed the tfe argument and use more conservative memory access attributes
by overriding mayLoad and hasSideEffects only for buffer_store_format_*.

About the index and offset arguments: index can't be merged since it uses
the stride in the buffer resource. The various offsets could be merged, but
it seems a lot of fighting with TableGen for little benefit.

If anything, I would try to unify the patterns for load and store before that,
but I couldn't figure out how to do that. The natural idea is to have a
multiclass BufferLoadStorePatterns and use !cast<MUBUF>(name#_OFFSET) etc.
to get the instructions. The problem is that load and store have different
numbers of arguments, and I couldn't figure out how to concat the operand
lists of dag nodes.

Unless there are suggestions for this or other strong objections, I'd rather
commit this as is.

This revision was automatically updated to reflect the committed changes.