Expand-Atomic pass emits the CAS loop for FP operations
which limits the optimizations offered by atomic optimizer.
Moving atomic optimizer before expand-atomics allows
better codegen.
Paths
| Differential D157265
[AMDGPU] Reorder atomic optimizer to avoid CAS loop. ClosedPublic Authored by pravinjagtap on Aug 7 2023, 2:42 AM.
Details
Summary Expand-Atomic pass emits the CAS loop for FP operations Moving atomic optimizer before expand-atomics allows
Diff Detail
Unit TestsFailed
Event Timelinepravinjagtap added a parent revision: D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer..Aug 7 2023, 2:43 AM Comment Actions
So the intention is that you still get a CAS loop, but the whole loop is executed by a single lane, instead of switching into and out of single-lane mode each time around the loop? Makes sense to me.
Comment Actions
At the moment, yes, that the current behavior. As a extension, will be creating new patch where atomic-expand calls simplifyCFG on the relevant blocks if it emits a CAS loop. [suggested by @arsenm] Comment Actions
AArch64 deals with this by inserting an extra simplifyCFG pass run, which seems excessive given we're only making local changes
Comment Actions Maybe should wait until the simplifycfg patch is ready to go to avoid regressions
This revision is now accepted and ready to land.Aug 7 2023, 10:12 AM pravinjagtap added a child revision: D157388: [AMDGPU] Support FMin/FMax in AMDGPUAtomicOptimizer..Aug 8 2023, 5:43 AM pravinjagtap removed a parent revision: D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer..Aug 17 2023, 5:24 AM pravinjagtap removed a child revision: D157388: [AMDGPU] Support FMin/FMax in AMDGPUAtomicOptimizer.. pravinjagtap added a parent revision: D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer..Aug 17 2023, 8:08 AM pravinjagtap added a child revision: D157388: [AMDGPU] Support FMin/FMax in AMDGPUAtomicOptimizer..Aug 17 2023, 8:11 AM pravinjagtap removed a child revision: D157495: [Atomic-Expand] Run SimplifyCFG from Atomic-Expand on CAS loop blocks..Aug 18 2023, 3:38 AM This revision was landed with ongoing or failed builds.Aug 30 2023, 9:06 AM Closed by commit rG6ef6c954c6dc: [AMDGPU] Reorder atomic optimizer to avoid CAS loop. (authored by pravinjagtap). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 551395 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/global-atomic-fadd.f32-no-rtn.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/global-atomic-fadd.f32-rtn.ll
llvm/test/CodeGen/AMDGPU/atomic-optimizer-strict-wqm.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll
llvm/test/CodeGen/AMDGPU/global-atomic-fadd.f32-no-rtn.ll
llvm/test/CodeGen/AMDGPU/global-atomic-fadd.f32-rtn.ll
llvm/test/CodeGen/AMDGPU/global-atomics-fp.ll
llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll
llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
llvm/test/CodeGen/AMDGPU/local-atomics-fp.ll
|
don't need all the paretheses