This will result in larger atomic operations getting expanded to
__atomic_* libcalls via AtomicExpandPass. This is a part of a change
to similarly clean up atomics handling on all targets.
When ldrex is available, we can support lock-free atomics up to 32
bits. When ldrexd is available, 64 bits.
When neither are available, we can still support up to 32-bit
lock-free atomics on certain platforms, if they provide
kernel-assisted cmpxchg. In that case, we can emit native 32-bit loads
and stores, and emit rmw/cmpxchg via __sync_* libcalls. 64-bit atomics
are not supported similarly, because 64-bit loads/stores aren't
atomic.
Finally, refactor/simplify the remainder of the code, which no longer
needs to try to deal with oversized atomics.