Page MenuHomePhabricator

IR: Add fp operations to atomicrmw
Needs ReviewPublic

Authored by arsenm on Oct 31 2018, 6:36 PM.

Details

Reviewers
jfb
reames
Summary

Add just fadd/fsub for now.

Diff Detail

Event Timeline

arsenm created this revision.Oct 31 2018, 6:36 PM
jfb added a comment.Nov 1 2018, 10:35 AM

Is this to support wg21.link/p0020 ?

Can you do fsub while you're there. It should be exactly the same thing.

AFAIK we don't care about FP flags at all here, correct? Even strictfp is irrelevant.

In D53965#1284143, @jfb wrote:

Is this to support wg21.link/p0020 ?

Yes

Can you do fsub while you're there. It should be exactly the same thing.

AMDGPU has hardware for fadd, fmin and fmax so I was focusing on those first, but it's trivial to add fsub. fmin/fmax are problematic since it requires deciding NaN behaviors (so I fear we'll end up with 2 or 3 versions of each)

AFAIK we don't care about FP flags at all here, correct? Even strictfp is irrelevant.

We can't algebraically combine this, so the flags shouldn't matter.

jfb added a comment.Nov 1 2018, 11:12 AM

AFAIK we don't care about FP flags at all here, correct? Even strictfp is irrelevant.

We can't algebraically combine this, so the flags shouldn't matter.

To clarify my concern: were LLVM to support fenv properly, this instruction can spuriously set FP flags if, say, the fadd executes in a loop and has spurious failure. The final store could hold a result which raised no FP exception, but intermediate results could have done so. But LLVM doesn't support fenv properly yet, so that's not an issue (I think!).

In D53965#1284240, @jfb wrote:

AFAIK we don't care about FP flags at all here, correct? Even strictfp is irrelevant.

We can't algebraically combine this, so the flags shouldn't matter.

To clarify my concern: were LLVM to support fenv properly, this instruction can spuriously set FP flags if, say, the fadd executes in a loop and has spurious failure. The final store could hold a result which raised no FP exception, but intermediate results could have done so. But LLVM doesn't support fenv properly yet, so that's not an issue (I think!).

fenv support requires using the new constrained operation intrinsics, but the support there is incomplete (and still some confusion on how they should interact with the regular IR operations, which is used in the cmpxchg loop expansion)

arsenm updated this revision to Diff 173923.Nov 13 2018, 1:10 PM
arsenm edited the summary of this revision. (Show Details)

Add fsub

arsenm updated this revision to Diff 175927.Thu, Nov 29, 12:10 PM

Rebase langref change

I'd note that this change isn't a necessary prerequisite to implementing the C++ feature -- FP math is already available in C11, and clang already implements it by lowering to cmpxchg. I'd be very skeptical of this addition, if you hadn't mentioned that AMDGPU supported it in hardware. AFAIK, there's no other ISAs that support atomic fadd in hardware. However, that AMDGPU does have hardware support seems like a sufficient reason to add support for it to LLVM.

On LL/SC architectures, while you could theoretically execute a floating-point instruction inside the LL/SC loop, there's often a bunch of complexity regarding the potential for a trap to be raised (e.g. for software FPU emulation, or for edge cases like denormals), which may deterministically cancel the reservation, and create an infinite loop. So I'm pretty wary of any proposal to lower to an LL/SC loop containing FP instructions -- I think even on LL/SC architectures, this should generally be lowered to a cmpxchg loop.

On a general note -- this patch doesn't seem complete. I think this should be a complete implementation of the feature -- specifying an "atomicrmw fadd" should function on all architectures after this, even if not most optimally on AMDGPU (since it's not possible to lower into SelectionDAG until after the next patch).

That'll means pulling in the AtomicExpand changes, and also updating all architectures' shouldExpandAtomicRMWInIR to return CmpXchg for these ops.

docs/LangRef.rst
8443

Lost some words.

include/llvm/IR/Instructions.h
813

Unused?

lib/CodeGen/AtomicExpandPass.cpp
1559–1560

This isn't great. And I see in your next patch you fix it, but splitting that to another patch doesn't make sense.

arsenm marked 2 inline comments as done.Thu, Dec 6, 10:02 AM
arsenm added inline comments.
include/llvm/IR/Instructions.h
813

It's not used here, but included for completeness since all of the other instructions do this (e.g. ICmpInst::getUnsignedPredicate

lib/CodeGen/AtomicExpandPass.cpp
1559–1560

I'd disagree, since usually with IR changes there's a split to add the minimum support to the assembler and bitcode before making them functional as a separate commit. This is just needed to quiet the covered switch warning

arsenm marked an inline comment as done.Thu, Dec 6, 10:14 AM
arsenm added inline comments.
include/llvm/IR/Instructions.h
813

I found a use for it when updating all the targets anyway

arsenm updated this revision to Diff 177098.Thu, Dec 6, 7:28 PM

Move AtomicExpand change, make targets default to cmpxchg