Add a llvm.cmpxchg op as a counterpart to LLVM IR's cmpxchg instruction.
Note that the weak, volatile, and syncscope attributes are not yet supported.
This will be useful for upcoming parallel versions of affine.for and generally
for reduction-like semantics (especially for reductions that can't make use
of atomicrmw, e.g. fmax).
Please add some documentation to this function.