Page MenuHomePhabricator

[FPEnv] Intrinsic for setting rounding mode
Needs ReviewPublic

Authored by sepavloff on Feb 17 2020, 10:16 AM.

Details

Summary

To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.

  • It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instruction and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels.
  • Compiler could generate more effective code if it know that particular call just sets rounding mode.

This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:

  • 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases.
  • Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level.

C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.

This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.

Diff Detail

Event Timeline

sepavloff created this revision.Feb 17 2020, 10:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2020, 10:16 AM
sepavloff updated this revision to Diff 260257.Apr 27 2020, 2:45 AM

Rebased patch

sepavloff updated this revision to Diff 260275.Apr 27 2020, 4:22 AM

Add missed change

kpn added a comment.Apr 27 2020, 10:52 AM

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

arsenm added a subscriber: arsenm.Apr 27 2020, 11:19 AM
In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

kpn added a comment.Apr 27 2020, 11:23 AM
In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

Is the fpsetround() function available on AMDGPU? Put another way, is this intrinsic needed on AMDGPU?

In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

Originally we had 1 instruction that can change all aspects of the FP environment (which has a heavy runtime cost). The newest subtargets also have 2 additional and faster instructions that can separately set the rounding mode, and denormal mode. Therefore the decision of how to lower this is different per-subtarget, and therefore an abstracted and legalizable intrinsic is useful

In D74729#2005867, @kpn wrote:
In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

Is the fpsetround() function available on AMDGPU? Put another way, is this intrinsic needed on AMDGPU?

We have no ISA libcalls of any kind, everything is through instructions (with different handling per-subtarget depending on what you're setting)

kpn added a comment.Apr 27 2020, 12:18 PM

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

Add to the IR verifier checks that the input is a constant that matches the documentation. The type and input values all need to be checked.

arsenm added inline comments.Apr 27 2020, 12:37 PM
llvm/docs/LangRef.rst
19156–19160

I'm wondering if this should be more opaque, and broader for the entire FP environment (not just the rounding mode). For AMDGPU we have a number of additional bits in the FP environment. We also have the denormal mode, enabling FP exceptions, and a few more exotic target specific FP mode bits.

RKSimon added inline comments.Apr 28 2020, 5:43 AM
llvm/docs/LangRef.rst
19130

Remove "read or " ?

19156–19160

@arsenm Would these extra bits be exclusive modes or would you need this to support target specific mode combos?

sepavloff marked 2 inline comments as done.Apr 28 2020, 9:49 AM
In D74729#2005958, @kpn wrote:

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

It cannot be such substitute. IIUC constrained intrinsics were introduced to represent variants of corresponding C functions that operate in non-default FP environment to distinguish them from "ordinary" variants, that are pure functions. Functions like set_rounding always access FP environment, it always have side effect and must be properly ordered.

Add to the IR verifier checks that the input is a constant that matches the documentation. The type and input values all need to be checked.

The type of input value is already checked by generic code. As for values, there are two notes.

  1. Input values may be a variable, not constant.
  2. A target may support non-standard rounding modes.

However constant values are limited by 3 bits, of which one (Dynamic) cannot be used as argument. So adding check to IR verifier makes sense.

llvm/docs/LangRef.rst
19130

This is a section for group of intrinsics. Now there is only one intrinsic in it, which indeed only writes FP environment. It makes sense to implement intrinsics fo standard C functions, like fegetmode, fetestexcept and others.

Actually the intrinsic flt_rounds may be documented here. I will add documentation for it.

19156–19160

I'm wondering if this should be more opaque, and broader for the entire FP environment (not just the rounding mode).

C library defined function fesetmode, which sets all control modes, not just rounding. It make sense to introduce intrinsic for it, which would serve these purposes.

kpn added a comment.Apr 28 2020, 11:36 AM
In D74729#2005958, @kpn wrote:

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

It cannot be such substitute. IIUC constrained intrinsics were introduced to represent variants of corresponding C functions that operate in non-default FP environment to distinguish them from "ordinary" variants, that are pure functions. Functions like set_rounding always access FP environment, it always have side effect and must be properly ordered.

Sure, you know that, and I know that, but someone who hasn't been following along the past couple of years may not know that. A sentence or two tying things together won't hurt. Something along the lines of "Altering the rounding mode requires special care. See 'Floating-Point Environment'.", with a link to that section of the documentation.

sepavloff updated this revision to Diff 261789.May 4 2020, 5:40 AM

Updated patch

  • Added verification code,
  • Added note to documentation.
sepavloff marked an inline comment as done.May 4 2020, 5:43 AM
In D74729#2008437, @kpn wrote:
In D74729#2005958, @kpn wrote:

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

It cannot be such substitute. IIUC constrained intrinsics were introduced to represent variants of corresponding C functions that operate in non-default FP environment to distinguish them from "ordinary" variants, that are pure functions. Functions like set_rounding always access FP environment, it always have side effect and must be properly ordered.

Sure, you know that, and I know that, but someone who hasn't been following along the past couple of years may not know that. A sentence or two tying things together won't hurt. Something along the lines of "Altering the rounding mode requires special care. See 'Floating-Point Environment'.", with a link to that section of the documentation.

Ah, got it! Added note to the documentation. Thank you!

llvm/docs/LangRef.rst
19130

Added flt_rounds in D79322.

arsenm added inline comments.May 6 2020, 3:29 PM
llvm/docs/LangRef.rst
19156–19160

We have the rounding mode controls as presented here, however they are broken down by FP type. We can separately set the rounding mode for f32 and f64/f16, so there are two different settings. We also have the denormal mode, for inputs and outputs, also broken down by type in the same way. The denormal handling and per-type handling I think deserve consideration here

We have 2 additional target specific FP bits nothing else would need to really think about, but it would be nice if you could set the exact mode you want in a single intrinsic call. I'm less interested in these though

19156–19160

Oh, we also have a bit to turn on/off fp exceptions which is probably generally interesting

arsenm added inline comments.May 6 2020, 3:33 PM
llvm/docs/LangRef.rst
19156–19160

And by a bit, I mean a mask of bits for different FP exception types

Rebased patch

craig.topper added inline comments.May 27 2020, 11:40 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6205

Don't you need to call getRoot not DAG.getRoot()?

llvm/lib/IR/Verifier.cpp
5126

Is the argument intended to always be a constant or we're just verifying it when we can? The latter seems unusual.

sepavloff updated this revision to Diff 266815.May 28 2020, 4:45 AM

Updated patch

  • Use getRoot() instead of DAG.getRoot().
sepavloff marked 2 inline comments as done.May 28 2020, 5:09 AM
sepavloff added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6205

Indeed. Thank you!

llvm/lib/IR/Verifier.cpp
5126

The argument may be a variable. If it is a constant, it must be a valid rounding mode. It is expected to be a value of type RoundingMode. Values from 0 to 4 denote IEEE rounding modes, they may be followed by target-specific rounding modes. The argument value must be less than RoundingMode::Dynamic, which now if 7.

I am hesitating if this code is useful enough, as even for constant argument its validity cannot be verified due to non-IEEE rounding modes. Probably we should remove this check.

sepavloff updated this revision to Diff 269541.Jun 9 2020, 7:45 AM

Rebased patch

I'm wondering if this should be more opaque, and broader for the entire FP environment (not just the rounding mode). For AMDGPU we have a number of additional bits in the FP environment. We also have the denormal mode, enabling FP exceptions, and a few more exotic target specific FP mode bits.

Instrinsics get.fpmode and set.fpmode introduced in D82525 may be used for this case.

sepavloff updated this revision to Diff 276313.Jul 7 2020, 10:56 PM

Rebased patch

Rebased patch

sepavloff updated this revision to Diff 285114.Aug 12 2020, 9:30 AM

Rebased patch