The __atomic_whatever functions in compiler-rt were only looking at the size of their argument when deciding whether the implementation should be lock-free. This is incorrect:
- On x86 simple loads and stores are not atomic if the address is misaligned. They could use cmpxchg to implement loads & stores instead, but I believe this would be incompatible with GCC's ABI (and it is an ABI choice because you obviously can't mix them).
- On other platforms (ARM that I know of) misaligned atomic accesses will simply fault (as well as the above).
So this patch falls back to locks for the misaligned case. It also adds the single-byte case to the generic functions since even though Clang won't make calls to them, they should probably work properly if anyone does.
We also need to fix this FIXME, for correctness... if we have a 16-byte atomic store implementation, we need to use it.