[OpenMP] libomp: eliminate pause from atomic CAS loops
For clang this change is NFC cleanup, because clang
never calls atomic functions from runtime library.
Basically, pause is good in spin-loops waiting for something.
Atomic CAS loops do not wait for anything,
each CAS failure means some other thread progressed.
Performance experiments show that the pause only causes unnecessary slowdown
on CPUs with slow pause instruction, no difference on CPUs with fast pause
instruction, removal of the pause gives lesser binary size which is good.
Differential Revision: https://reviews.llvm.org/D97079