diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -8712,12 +8712,17 @@ work-group since they execute on the same CU. The exception is when in tgsplit execution mode as wavefronts of the same work-group can be in different CUs and so a ``buffer_inv sc0`` is required which will invalidate - the L1 cache is in tgsplit mode. + the L1 cache. - * A ``buffer_inv sc1`` is required to invalidate the L1 cache for coherence + * A ``buffer_inv sc0`` is required to invalidate the L1 cache for coherence between wavefronts executing in different work-groups as they may be executing on different CUs. + * Atomic read-modify-write instructions implicitly bypass the L1 cache. + Therefore, they do not use the sc0 bit for coherence and instead use it to + indicate if the instruction returns the original value being updated. They + do use sc1 to indicate system or agent scope coherence. + * The scalar memory operations access a scalar L1 cache shared by all wavefronts on a group of CUs. The scalar and vector L1 caches are not coherent. However, scalar operations are used in a restricted way so do not impact the memory @@ -8891,8 +8896,6 @@ - generic sc0=1 sc1=1 store atomic monotonic - singlethread - global 1. buffer/global/flat_store - wavefront - generic - store atomic monotonic - singlethread - global 1. buffer/global/flat_store - - wavefront - generic store atomic monotonic - workgroup - global 1. buffer/global/flat_store - generic sc0=1 store atomic monotonic - agent - global 1. buffer/global/flat_store @@ -9639,7 +9642,7 @@ store that is being released. - 3. buffer/global/flat_store sc1=1 + 3. buffer/global/flat_store sc1=1 store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1 - generic - Must happen before @@ -9694,7 +9697,7 @@ store that is being released. - 2. buffer/global/flat_store + 3. buffer/global/flat_store sc0=1 sc1=1 atomicrmw release - singlethread - global 1. buffer/global/flat_atomic - wavefront - generic @@ -10878,7 +10881,7 @@ ------------------------------------------------------------------------------------ load atomic seq_cst - singlethread - global *Same as corresponding - wavefront - local load atomic acquire, - - generic except must generated + - generic except must generate all instructions even for OpenCL.* load atomic seq_cst - workgroup - global 1. s_waitcnt lgkm/vmcnt(0) @@ -10963,7 +10966,7 @@ instructions same as corresponding load atomic acquire, - except must generated + except must generate all instructions even for OpenCL.* load atomic seq_cst - workgroup - local *If TgSplit execution mode, @@ -10972,7 +10975,7 @@ *Same as corresponding load atomic acquire, - except must generated + except must generate all instructions even for OpenCL.* @@ -11066,22 +11069,22 @@ instructions same as corresponding load atomic acquire, - except must generated + except must generate all instructions even for OpenCL.* store atomic seq_cst - singlethread - global *Same as corresponding - wavefront - local store atomic release, - - workgroup - generic except must generated + - workgroup - generic except must generate - agent all instructions even - system for OpenCL.* atomicrmw seq_cst - singlethread - global *Same as corresponding - wavefront - local atomicrmw acq_rel, - - workgroup - generic except must generated + - workgroup - generic except must generate - agent all instructions even - system for OpenCL.* fence seq_cst - singlethread *none* *Same as corresponding - wavefront fence acq_rel, - - workgroup except must generated + - workgroup except must generate - agent all instructions even - system for OpenCL.* ============ ============ ============== ========== ================================