This turns an idempotent atomic operation into an atomic load.
Fixes: SWDEV-385135
Differential D144759
[AMDGPU] Implement idempotent atomic lowering rampitec on Feb 24 2023, 1:34 PM. Authored by
Details
This turns an idempotent atomic operation into an atomic load. Fixes: SWDEV-385135
Diff Detail Event Timeline
Comment Actions OK, let's be on a safe side. https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf tells than a release fence is needed for load ordering if rmw is release or stronger. Legalizer does not do it just by itself, although the only noticeable difference in codegen is with seq_cst, which looks reasonable. Comment Actions Just discussed it with Tony. This seems somewhat problematic as exploiting a general lack of other atomic optimizations and that we cannot really reorder a fence. But then we only really need it for relaxed atomic and can safely do without a fence for a relaxed or acquire atomic. So let's keep it simple and only do the optimization if there is no release semantics on the atomicrmw. I will update the patch. Comment Actions Simplified patch to avoid the optimization on any atomicrmw with a release semantics. A monotonic or acquire does not require a fence or cache flush. Comment Actions In the form as I did it it probably can be a generic optimization. The fence part is questionable because in reality it would need not a fence, but a corresponding cache flush. Then I see that x86 want to avoid it specifically for atomic 'or' operation because they have a better lowering, so making it generic will cause x86 to regress. |
Didn't use Order saved above