Pause is good in spin-loops waiting for something.
Atomic CAS loops do not wait for anything,
each CAS failure means some other thread progressed.
So pause only causes unnecessary slowdown.
Paths
| Differential D97079
[OpenMP] libomp: eliminate pause from atomic CAS loops ClosedPublic Authored by AndreyChurbanov on Feb 19 2021, 12:21 PM.
Details Summary Pause is good in spin-loops waiting for something. So pause only causes unnecessary slowdown.
Diff Detail
Event TimelineComment Actions I think this is architecture specific. If a CAS failed spuriously, then immediately retry is good. If it failed because another core wrote to the cache line, then we have established that said cache line is somewhat contended, in which case pause() may let the other threads progress faster. This might be difficult to microbenchmark. Comment Actions
If one thread succeeded then ALL other threads competing on the cache line would fail. So they will not progress faster executing pause instruction. Only gain can be for succeeding thread, that means it executes nothing but atomic in a loop, and that is not a real code example... Actually the atomic functions are never called by clang, so it is kind of cleanup change in any case. Experiments calling an internal atomic function directly in a loop show either no difference on some processors (where pause instruction is fast enough), or significant speedup (where pause instruction is slow). One clear gain of this patch is reduced size of the runtime - I see 4096 bytes smaller size of the binary if the patch applied. If you still think the patch is not good, I am fine to abandon it. But to me it is pretty harmless, and as I mentioned, it reduces the size of the runtime. Comment Actions
I think we should keep this patch. These atomic functions are just trying to mimic what a hardware instruction would accomplish. In line with Andrey's reasoning, if a CAS fails here then there is nothing to wait for since the other thread that caused the fail is done with the atomic. i.e., the critical section is the atomic itself. Hence, the "spin wait" is already over and another retry should take place immediately. Comment Actions Someone measuring the performance on a few architectures soundly beats my speculation. No objection here, thanks! This revision is now accepted and ready to land.Mar 8 2021, 11:54 AM Closed by commit rGaaf16b80dd4c: [OpenMP] libomp: eliminate pause from atomic CAS loops (authored by AndreyChurbanov). · Explain WhyMar 9 2021, 7:30 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 329344 openmp/runtime/src/kmp_atomic.cpp
|