These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do not carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
Where does old come from? Did you mean *ptr ?
I think the meaning of val should also be documented. AFAICT, it's the maximum allowed values in *p and the inc/dec are expected to wrap resulting values within the range [0, val].
My naive expectation was that we'd want to increment/decrement by val, but this inc/dec within [0..val] behavior matches what NVPTX instructions do. I'm not sure If it's an established convention that I've been lucky to miss till now or if we're simply encoding what existing hardware does. If it's the latter, is it universal/useful enough to add to IR?
The answer is probably 'yes', as we already seem to have two back-ends that could benefit, but it would be great to hear from folks familiar with other back-ends.