Added atomic_fetch_min, max, umin, umax intrinsics to clang.
These intrinsics work exactly as all other atomic_fetch_* intrinsics and allow to create *atomicrmw* with ordering.
The similar set __sync_fetch_and_min* sets the sequentially-consistent ordering.
We use them for OpenCL 1.2, which supports atomic operations with "relaxed" ordering.
Thank you for adding this documentation. Please do clarify what the memory ordering semantics actually are when the atomic object does not need to be updated, though, and verify that target code generation actually obeys that ordering. For example, if the memory ordering makes this a release operation, __atomic_fetch_min must always store the result back to the atomic object, even if the new value was actually greater than the stored value; I believe that would not be required with a relaxed operation.