This patch adds the support for fmax and fmin operations in atomicrmw
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/docs/LangRef.rst | ||
---|---|---|
10377–10378 | This is *not* what fmin/fmax do and ignores nan handling, and canonicalization We plausibly need to have 3 different pairs:
This absolutely needs to state which one it is if it's just one |
llvm/docs/LangRef.rst | ||
---|---|---|
10377–10378 | Thanks for the information. My initial thought is we have a "unified" representation at the IR level, and leave the backend decide what should be the behavior. I'm not sure if one backend is gonna support more than one, but if that's the case, we would definitely need different representation. Based on https://en.cppreference.com/w/c/numeric/math/fmax, it looks like libm's behavior is pretty similar to what we expect to see here. Speaking of that, AMDGPU already has some support for atomic_fmax and atomic_fmin via builtins. Which standard does it follow? |
llvm/docs/LangRef.rst | ||
---|---|---|
10377–10378 | Having backend interpretations of IR semantics is no good. We can have as many operations as needed and targets can request expansion as required. The AMDGPU answer is complicated. For denormal handling it varies based on subtarget and address space. I’m not sure if they handle signaling nans correctly. For the ALU operations, it depends on the IEEE mode bit which I suspect is ignored by atomics |
choose to let FMax and FMin match the behavior of llvm.maxnum.* and llvm.minnum.* respectively
llvm/docs/LangRef.rst | ||
---|---|---|
1582–1585 | Unrelated whitespace changes | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
44637 | Unrelated whitespace change | |
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp | ||
73–75 | This is just wrong. I think this is questionable for the fadd/fsub case since I guess it depends if the payload bits are preserved? | |
llvm/lib/Transforms/Utils/LowerAtomic.cpp | ||
78–79 | This is also wrong. Needs to insert minnum/maxnum intrinsics |
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp | ||
---|---|---|
73–75 | I fixed the cases for fmax and fmin. That's the only case off the top of my head. As for fadd/fsub, I think it will be fine because the operation will be changed to Xchg, and the result will be CF, which conforms with the standard that:
|
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp | ||
---|---|---|
73–75 | This is missing test coverage |
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp | ||
---|---|---|
66–67 | Like we mentioned above, fmax and fmin pair is expected to match the behavior of llvm.maxnum.* and llvm.minnum.*. Based on https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic:
So here, no matter what value the pointer operand would be, NaN or not, as long as we have +inf for maxnum, +inf should be returned.
Therefore, I don't see how it is wrong. Could you expatiate it? | |
llvm/test/Bitcode/compatibility.ll | ||
870 | Thanks for catching that. Will do. |
llvm/test/Bitcode/compatibility.ll | ||
---|---|---|
870 | I took a quick look at tests for DXIL. There is no test for atomicrmw stuff. The test coverage is fairly minimum. I think at this moment, we could skip that. |
llvm/docs/GlobalISel/GenericOpcode.rst | ||
---|---|---|
733 | I believe this patch broke the rendering of the docs. https://llvm.org/docs/GlobalISel/GenericOpcode.html#g-atomic-cmpxchg |
I believe this patch broke the rendering of the docs.
All the intrinsics must be placed in a single line, otherwise the sequence of ^^^^ below won't work.
https://llvm.org/docs/GlobalISel/GenericOpcode.html#g-atomic-cmpxchg