This is a retread of the issue we had in the past with cmpxchg: fast regalloc can generate spills in weird places, so the ll/sc sequence never terminates. To avoid this, don't use the early ll/sc expansion at -O0.
I didn't spend the time to implement individual instructions for each atomicrmw instruction; instead, it's just expanding to the known-good cmpxchg. This is inefficient, but not sure how much we care; we don't use these sequences with LSE or atomic outlining.