I think there might be something to optimize in atomic_store.
Currently, if everything goes well (and we have a different new value), we
always iterate 3 times.
For example, with a = 0, oldval = a, newval = 42, we get:
oldval = 0, newval = 42, curval = 0 oldval = 0, newval = 42, curval = 42 oldval = 42, newval = 42, curval = 42
and then it breaks.
Unless I am not seeing something, I don't see a point to the third iteration.
If the current value is the one we want, we should just break.
This means that 2 iterations (with a different newval) should be sufficient to
achieve what we want.