These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do not carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
The expressions here look suspicious - I thought these operated as inc/dec then modulo value. Does inc(p,10) really go to zero if p has the initial value 15? If so clamp might be a better name.
Likewise the ==0 in dec, though I think unsigned compare makes that expression work. Is it *ptr = ((*ptr -1) % val) ?