This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Lower buffer store and atomic intrinsics manually
ClosedPublic

Authored by mareko on Oct 18 2017, 9:49 AM.

Details

Summary

Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Event Timeline

mareko created this revision.Oct 18 2017, 9:49 AM
nhaehnle added inline comments.Oct 23 2017, 8:58 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

Unfortunately, there's not a lot of documentation on MOVolatile, but I suspect this should not be set at least when GLC == SLC == 0. And I image that that would fix the issue with D39012 as well... (which means the order of patches should be reversed).

arsenm added inline comments.Oct 23 2017, 9:40 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4264–4265

Move to a separate function?

4566–4568

You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to accomplish. I also think we aren't setting volatile directly to GLC and the memory legalizer pass is supposed to set GLC.

nhaehnle edited edge metadata.Oct 23 2017, 10:14 AM

Can the same be achieved by implementing getTgtMemIntrinsic?

nhaehnle added inline comments.Oct 23 2017, 10:18 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to coherent buffer objects, but those still need memoryBarrier()s for guaranteed ordering. So I agree, buffer stores should never be MOVolatile.

t-tye added inline comments.Oct 23 2017, 8:44 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

MMO now have atomic memory ordering and memory scope that convey how atomics are required to be coherent. The memory legalizer pass uses this information to set glc bit, generate appropriate watcnt, and cache invalidate instructions.

These are separate from the volatile property which has a different purpose.

So if the goal is to request atomic coherence (release/acquire memory model semantics) shouldn't the MMO memory ordering/scope be set correctly?

nhaehnle added inline comments.Oct 24 2017, 5:37 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by themselves. As far as GLSL is concerned, some buffer stores (those to "coherent" buffers) can be combined with memoryBarrier builtins, in which case there are some guarantees about ordering wrt other shader invocations, but the stores themselves provide no such guarantee.

Maybe the "coherent" flag can be modeled with with those memory scopes - where are they documented?

In general, I definitely agree that we should use the MMO machinery correctly :)

nhaehnle added inline comments.Oct 24 2017, 5:40 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

That said, it of course makes sense to talk about how to set the MMO for buffer_atomic intrinsics as well. "relaxed" (or "monotonic", in LLVM speak) ordering might actually be sufficient for those for GLSL semantics (again, because GLSL kind of wants you to add explicit memoryBarrier() builtin function calls), but I haven't fully thought this through.

t-tye added inline comments.Oct 24 2017, 6:20 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

For the AMDGPU target the memory model currently implemented is documented in https://llvm.org/docs/AMDGPUUsage.html#memory-model . It does include the LLVM IR fence. Feel free to ping me if you want to discuss what settings you should use to achieve the GLSL memory model semantics as we worked through doing the OpenCL/HSA memory model mapping.

mareko added inline comments.Oct 31 2017, 1:49 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4264–4265

I would if it had more uses.

4566–4568

We could do what amdgcn_atomic_inc/dec intrinsics do: have "i1 volatile" as an intrinsic parameter. In the meantime, I'll just remove MOVolatile.

mareko updated this revision to Diff 121056.Oct 31 2017, 2:04 PM
  • don't set MOVolatile
  • this precedes the buffer store merging patch
t-tye added inline comments.Oct 31 2017, 3:49 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568

Note that using volatile is not the same think as using the atomic memory_order. So if intrinsics are relying on volatile to indicate an atomic operation without setting the memory_order correctly, then that sounds like a bug that would be good to fix:-)

This revision was automatically updated to reflect the committed changes.