There are potential benefits in simplifying
CFG just after atomic-expand pass since
it changes control flow.
On AMDGPU targets, for global FP atomic
operations, atomic-expand
pass emits CAS loop which is not efficient.
To optimize atomics AMDGPU target runs
AMDGPUAtomicOptimizer just before
atomic-expand pass.
Running AMDGPUAtomic Optimzer and
atomic expand introduces new control flow,
therefore, running CFG Simplification allows
better codegen.
AArch64 deals with this by inserting an extra
simplifyCFG pass run, which seems excessive.
This didn't account for the require and preserve dom tree