Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.
Details
Diff Detail
- Build Status
Buildable 11689 Build 11689: arc lint + arc unit
Event Timeline
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4264–4265 | Move to a separate function? | |
4566–4568 | You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to accomplish. I also think we aren't setting volatile directly to GLC and the memory legalizer pass is supposed to set GLC. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4566–4568 | You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to coherent buffer objects, but those still need memoryBarrier()s for guaranteed ordering. So I agree, buffer stores should never be MOVolatile. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4566–4568 | MMO now have atomic memory ordering and memory scope that convey how atomics are required to be coherent. The memory legalizer pass uses this information to set glc bit, generate appropriate watcnt, and cache invalidate instructions. These are separate from the volatile property which has a different purpose. So if the goal is to request atomic coherence (release/acquire memory model semantics) shouldn't the MMO memory ordering/scope be set correctly? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4566–4568 | This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by themselves. As far as GLSL is concerned, some buffer stores (those to "coherent" buffers) can be combined with memoryBarrier builtins, in which case there are some guarantees about ordering wrt other shader invocations, but the stores themselves provide no such guarantee. Maybe the "coherent" flag can be modeled with with those memory scopes - where are they documented? In general, I definitely agree that we should use the MMO machinery correctly :) |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4566–4568 | That said, it of course makes sense to talk about how to set the MMO for buffer_atomic intrinsics as well. "relaxed" (or "monotonic", in LLVM speak) ordering might actually be sufficient for those for GLSL semantics (again, because GLSL kind of wants you to add explicit memoryBarrier() builtin function calls), but I haven't fully thought this through. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4566–4568 | For the AMDGPU target the memory model currently implemented is documented in https://llvm.org/docs/AMDGPUUsage.html#memory-model . It does include the LLVM IR fence. Feel free to ping me if you want to discuss what settings you should use to achieve the GLSL memory model semantics as we worked through doing the OpenCL/HSA memory model mapping. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4566–4568 | Note that using volatile is not the same think as using the atomic memory_order. So if intrinsics are relying on volatile to indicate an atomic operation without setting the memory_order correctly, then that sounds like a bug that would be good to fix:-) |
Move to a separate function?