Due to interactions between RegAllocFast and expansion of atomicrmw at -O0,
both ARM and AArch64 backends would emit stores between ldrex and strex,
which clears the exclusive access monitor.
atomicrmw instructions are expanded to loops, where the main MachineBasicBlock
includes a ldrex/strex pair. It then conditionally branches if this atomic
operation was successful. Because of this, the register loaded by ldrex is
LiveOut, and RegAllocFast therefore spills this register. The issue is that it
spills between the ldrex and strex, which invalidates the monitor.
I tried several ways of fixing this which all have problems:
- Adding a pass after RegAllocFast which moved the str instructions (spills) to after the strex. For more complex sequences like those generated for 64 bit atomics with e.g. nand, this becomes difficult to do.
- Add new pseudo instructions for atomicrmw which are expanded after register allocation. This would involve duplicating all of the loop creation code. Similar approach has been used before for cmpxchg: https://reviews.llvm.org/D16239?id=52861
- Stop FastRegAlloc from spilling these registers for these instructions. However, other instructions between ldrex and strex can spill, and it is hard to catch them all.
- Move the location of the spill to after the strex. This is the approach taken.
To spill after strex, I have added a new function to TargetRegisterInfo which returns an appropriate
point to spill at for a given instruction. For all backends except ARM/AArch64 this just returns the
next instruction. For ARM/AArch64 it returns the strex.
- A similar approach could have been applied to calcSpillCost instead, to set a high spill cost between ldrex/strex
- Pseudo instructions could have been used instead
It is also possible that the cmpxchg pseudoinstructions could be removed, and the same technique used for them.