This is an archive of the discontinued LLVM Phabricator instance.

[LLVM] Add the support for fmax and fmin in atomicrmw instruction
ClosedPublic

Authored by tianshilei1992 on Jun 3 2022, 6:17 PM.

Details

Summary

This patch adds the support for fmax and fmin operations in atomicrmw
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Jun 3 2022, 6:17 PM
Herald added a project: Restricted Project. · View Herald Transcript
tianshilei1992 requested review of this revision.Jun 3 2022, 6:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 6:17 PM
arsenm added inline comments.Jun 3 2022, 6:32 PM
llvm/docs/LangRef.rst
10336–10337

This is *not* what fmin/fmax do and ignores nan handling, and canonicalization

We plausibly need to have 3 different pairs:

  • The libm behavior for fmin/fmax (which matches the misnamed llvm.minnum/maxnum intrinsics)
  • The IEEE-754 2008 minnum/maxnum behavior which has different signaling nan behavior
  • The IEEE-754 2019 minnum/maxnum behavior which inverts the nan behavior

This absolutely needs to state which one it is if it's just one

lkail added a subscriber: lkail.Jun 3 2022, 9:25 PM
tianshilei1992 added inline comments.Jun 4 2022, 5:12 PM
llvm/docs/LangRef.rst
10336–10337

Thanks for the information. My initial thought is we have a "unified" representation at the IR level, and leave the backend decide what should be the behavior. I'm not sure if one backend is gonna support more than one, but if that's the case, we would definitely need different representation. Based on https://en.cppreference.com/w/c/numeric/math/fmax, it looks like libm's behavior is pretty similar to what we expect to see here. Speaking of that, AMDGPU already has some support for atomic_fmax and atomic_fmin via builtins. Which standard does it follow?

arsenm added inline comments.Jun 4 2022, 6:26 PM
llvm/docs/LangRef.rst
10336–10337

Having backend interpretations of IR semantics is no good. We can have as many operations as needed and targets can request expansion as required.

The AMDGPU answer is complicated. For denormal handling it varies based on subtarget and address space. I’m not sure if they handle signaling nans correctly. For the ALU operations, it depends on the IEEE mode bit which I suspect is ignored by atomics

choose to let FMax and FMin match the behavior of llvm.maxnum.* and llvm.minnum.* respectively

tianshilei1992 marked 2 inline comments as done.

update doc as well

arsenm added inline comments.Jun 6 2022, 2:41 PM
llvm/docs/LangRef.rst
1580–1582

Unrelated whitespace changes

llvm/lib/Target/X86/X86ISelLowering.cpp
44390

Unrelated whitespace change

llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
67–69

This is just wrong.

I think this is questionable for the fadd/fsub case since I guess it depends if the payload bits are preserved?

llvm/lib/Transforms/Utils/LowerAtomic.cpp
78–79

This is also wrong. Needs to insert minnum/maxnum intrinsics

update comments

tianshilei1992 marked 3 inline comments as done.

fix typo

tianshilei1992 marked an inline comment as done.Jun 6 2022, 9:14 PM
tianshilei1992 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
67–69

I fixed the cases for fmax and fmin. That's the only case off the top of my head.

As for fadd/fsub, I think it will be fine because the operation will be changed to Xchg, and the result will be CF, which conforms with the standard that:

An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.

arsenm added inline comments.Jun 7 2022, 6:37 AM
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
67–69

This is missing test coverage

tianshilei1992 marked an inline comment as done.

Add test for InstCombineAtomicRMW

tianshilei1992 marked an inline comment as done.Jun 7 2022, 8:25 AM
arsenm added a comment.Jun 8 2022, 6:40 AM

This is an IR change and should get a mention in the release notes

mention in release note

ormris removed a subscriber: ormris.Jun 8 2022, 3:07 PM

rebase and fix test failure

new week, new ping :-)

arsenm added inline comments.Jun 27 2022, 1:36 PM
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
66–67

This isn't correct for signaling nans

llvm/test/Bitcode/compatibility.ll
858

Since you also touched dxil, is an additional test needed for dxil encoding?

tianshilei1992 added inline comments.Jun 29 2022, 7:47 AM
llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
66–67

Like we mentioned above, fmax and fmin pair is expected to match the behavior of llvm.maxnum.* and llvm.minnum.*. Based on https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic:

If either operand is a NaN, returns the other non-NaN operand. Returns NaN only if both operands are NaN. The returned NaN is always quiet.

So here, no matter what value the pointer operand would be, NaN or not, as long as we have +inf for maxnum, +inf should be returned.
As for the signaling NaNs, the manual says:

Unlike the IEEE-754 2008 behavior, this does not distinguish between signaling and quiet NaN inputs.

Therefore, I don't see how it is wrong. Could you expatiate it?

llvm/test/Bitcode/compatibility.ll
858

Thanks for catching that. Will do.

tianshilei1992 marked an inline comment as done.Jun 29 2022, 7:56 AM
tianshilei1992 added inline comments.
llvm/test/Bitcode/compatibility.ll
858

I took a quick look at tests for DXIL. There is no test for atomicrmw stuff. The test coverage is fairly minimum. I think at this moment, we could skip that.

LGTM besides the test issue

llvm/lib/Transforms/InstCombine/InstCombineAtomicRMW.cpp
66–67

Right, ugh. These really should have been named fmin/fmax

llvm/test/Assembler/atomic.ll
50–55

Missing fmin test

llvm/test/Transforms/InstCombine/atomicrmw.ll
304 ↗(On Diff #436246)

Could use some negative tests

tianshilei1992 marked 4 inline comments as done.

fix comments

arsenm accepted this revision.Jul 6 2022, 6:20 AM

LGTM

This revision is now accepted and ready to land.Jul 6 2022, 6:20 AM
This revision was landed with ongoing or failed builds.Jul 6 2022, 7:58 AM
This revision was automatically updated to reflect the committed changes.

LGTM

Thanks for your review and all helpful comments!

fdeazeve added inline comments.
llvm/docs/GlobalISel/GenericOpcode.rst
733

I believe this patch broke the rendering of the docs.
All the intrinsics must be placed in a single line, otherwise the sequence of ^^^^ below won't work.

https://llvm.org/docs/GlobalISel/GenericOpcode.html#g-atomic-cmpxchg