I think there might be something to optimize in atomic_store.
Currently, if everything goes well (and we have a different new value), we
always iterate 3 times.
For example, with a = 0, oldval = a, newval = 42, we get:
oldval = 0, newval = 42, curval = 0 oldval = 0, newval = 42, curval = 42 oldval = 42, newval = 42, curval = 42
and then it breaks.
Unless I am not seeing something, I don't see a point to the third iteration.
If the current value is the one we want, we should just break.
This means that 2 iterations (with a different newval) should be sufficient to
achieve what we want.
The current is not just slow, it looks broken to me. Is it even used anywhere? Maybe we should just drop it. I think we generally require C++11 for building compiler-rt.
The canonical way to do this is cur == cmp which should accomplish the operation with 1 CAS in common case. The old check was probably an attempt to optimize for the case when the variable already contains the value we want to store (and we obtained an atomic snapshot of the current value with CAS). So perhaps this could be cur == cmp || cur == v?