Memory models for gfx90a and gfx940 do not require buffer_wbl2
before the fence for acquire ordering, but we do insert the full
release.
Fixes: SWDEV-386785
Paths
| Differential D145524
[AMDGPU] Skip buffer_wbl2 before atomic fence acquire ClosedPublic Authored by rampitec on Mar 7 2023, 12:43 PM.
Details Summary Memory models for gfx90a and gfx940 do not require buffer_wbl2 Fixes: SWDEV-386785
Diff Detail Event Timelinet-tye added inline comments.
This revision now requires changes to proceed.Mar 7 2023, 3:23 PM
Comment Actions After offline discussion, the extra waitcnt is needed because this is a fence, and an acquire needs a waitcnt to ensure a proceeding load atomic that pairs with the fence has completed before invalidating the cache. The memory model on AMDGPUUsage does show the extra waitcnt for the fence. Previously, the waitcnt was being generated as part of the release which is not required if the fence is just an acquire. This revision is now accepted and ready to land.Mar 7 2023, 4:23 PM This revision was landed with ongoing or failed builds.Mar 8 2023, 1:24 AM Closed by commit rG59162e38590f: [AMDGPU] Skip buffer_wbl2 before atomic fence acquire (authored by rampitec). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 503134 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll
|
Why is this waitcnt being added, I do not believe the memory model requires this. An acquire does not require previous memory operations to be complete. It only requires that the location being loaded has completed, followed by invalidating the caches consistent with the scope.