Page MenuHomePhabricator

[FPEnv] Intrinsic for setting rounding mode
ClosedPublic

Authored by sepavloff on Feb 17 2020, 10:16 AM.

Details

Summary

To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.

  • It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instruction and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels.
  • Compiler could generate more effective code if it know that particular call just sets rounding mode.

This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:

  • 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases.
  • Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level.

C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.

This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sepavloff updated this revision to Diff 260275.Apr 27 2020, 4:22 AM

Add missed change

kpn added a comment.Apr 27 2020, 10:52 AM

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

arsenm added a subscriber: arsenm.Apr 27 2020, 11:19 AM
In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

kpn added a comment.Apr 27 2020, 11:23 AM
In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

Is the fpsetround() function available on AMDGPU? Put another way, is this intrinsic needed on AMDGPU?

In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

Originally we had 1 instruction that can change all aspects of the FP environment (which has a heavy runtime cost). The newest subtargets also have 2 additional and faster instructions that can separately set the rounding mode, and denormal mode. Therefore the decision of how to lower this is different per-subtarget, and therefore an abstracted and legalizable intrinsic is useful

In D74729#2005867, @kpn wrote:
In D74729#2005777, @kpn wrote:

Are there GPUs or AI cores _targeted by llvm_ that support changing the rounding mode at run-time?

AMDGPU can

Is the fpsetround() function available on AMDGPU? Put another way, is this intrinsic needed on AMDGPU?

We have no ISA libcalls of any kind, everything is through instructions (with different handling per-subtarget depending on what you're setting)

kpn added a comment.Apr 27 2020, 12:18 PM

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

Add to the IR verifier checks that the input is a constant that matches the documentation. The type and input values all need to be checked.

arsenm added inline comments.Apr 27 2020, 12:37 PM
llvm/docs/LangRef.rst
18397–18401

I'm wondering if this should be more opaque, and broader for the entire FP environment (not just the rounding mode). For AMDGPU we have a number of additional bits in the FP environment. We also have the denormal mode, enabling FP exceptions, and a few more exotic target specific FP mode bits.

RKSimon added inline comments.Apr 28 2020, 5:43 AM
llvm/docs/LangRef.rst
18371

Remove "read or " ?

18397–18401

@arsenm Would these extra bits be exclusive modes or would you need this to support target specific mode combos?

sepavloff marked 2 inline comments as done.Apr 28 2020, 9:49 AM
In D74729#2005958, @kpn wrote:

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

It cannot be such substitute. IIUC constrained intrinsics were introduced to represent variants of corresponding C functions that operate in non-default FP environment to distinguish them from "ordinary" variants, that are pure functions. Functions like set_rounding always access FP environment, it always have side effect and must be properly ordered.

Add to the IR verifier checks that the input is a constant that matches the documentation. The type and input values all need to be checked.

The type of input value is already checked by generic code. As for values, there are two notes.

  1. Input values may be a variable, not constant.
  2. A target may support non-standard rounding modes.

However constant values are limited by 3 bits, of which one (Dynamic) cannot be used as argument. So adding check to IR verifier makes sense.

llvm/docs/LangRef.rst
18371

This is a section for group of intrinsics. Now there is only one intrinsic in it, which indeed only writes FP environment. It makes sense to implement intrinsics fo standard C functions, like fegetmode, fetestexcept and others.

Actually the intrinsic flt_rounds may be documented here. I will add documentation for it.

18397–18401

I'm wondering if this should be more opaque, and broader for the entire FP environment (not just the rounding mode).

C library defined function fesetmode, which sets all control modes, not just rounding. It make sense to introduce intrinsic for it, which would serve these purposes.

kpn added a comment.Apr 28 2020, 11:36 AM
In D74729#2005958, @kpn wrote:

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

It cannot be such substitute. IIUC constrained intrinsics were introduced to represent variants of corresponding C functions that operate in non-default FP environment to distinguish them from "ordinary" variants, that are pure functions. Functions like set_rounding always access FP environment, it always have side effect and must be properly ordered.

Sure, you know that, and I know that, but someone who hasn't been following along the past couple of years may not know that. A sentence or two tying things together won't hurt. Something along the lines of "Altering the rounding mode requires special care. See 'Floating-Point Environment'.", with a link to that section of the documentation.

sepavloff updated this revision to Diff 261789.May 4 2020, 5:40 AM

Updated patch

  • Added verification code,
  • Added note to documentation.
sepavloff marked an inline comment as done.May 4 2020, 5:43 AM
In D74729#2008437, @kpn wrote:
In D74729#2005958, @kpn wrote:

I think we need to be clear in the documentation that this is _not_ a substitute for the constrained FP intrinsics. Can you please add that to your new documentation? Include a link to the "Floating-Point Environment" section in the language ref.

It cannot be such substitute. IIUC constrained intrinsics were introduced to represent variants of corresponding C functions that operate in non-default FP environment to distinguish them from "ordinary" variants, that are pure functions. Functions like set_rounding always access FP environment, it always have side effect and must be properly ordered.

Sure, you know that, and I know that, but someone who hasn't been following along the past couple of years may not know that. A sentence or two tying things together won't hurt. Something along the lines of "Altering the rounding mode requires special care. See 'Floating-Point Environment'.", with a link to that section of the documentation.

Ah, got it! Added note to the documentation. Thank you!

llvm/docs/LangRef.rst
18371

Added flt_rounds in D79322.

arsenm added inline comments.May 6 2020, 3:29 PM
llvm/docs/LangRef.rst
18397–18401

We have the rounding mode controls as presented here, however they are broken down by FP type. We can separately set the rounding mode for f32 and f64/f16, so there are two different settings. We also have the denormal mode, for inputs and outputs, also broken down by type in the same way. The denormal handling and per-type handling I think deserve consideration here

We have 2 additional target specific FP bits nothing else would need to really think about, but it would be nice if you could set the exact mode you want in a single intrinsic call. I'm less interested in these though

18397–18401

Oh, we also have a bit to turn on/off fp exceptions which is probably generally interesting

arsenm added inline comments.May 6 2020, 3:33 PM
llvm/docs/LangRef.rst
18397–18401

And by a bit, I mean a mask of bits for different FP exception types

Rebased patch

craig.topper added inline comments.May 27 2020, 11:40 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6197

Don't you need to call getRoot not DAG.getRoot()?

llvm/lib/IR/Verifier.cpp
4973

Is the argument intended to always be a constant or we're just verifying it when we can? The latter seems unusual.

sepavloff updated this revision to Diff 266815.May 28 2020, 4:45 AM

Updated patch

  • Use getRoot() instead of DAG.getRoot().
sepavloff marked 2 inline comments as done.May 28 2020, 5:09 AM
sepavloff added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6197

Indeed. Thank you!

llvm/lib/IR/Verifier.cpp
4973

The argument may be a variable. If it is a constant, it must be a valid rounding mode. It is expected to be a value of type RoundingMode. Values from 0 to 4 denote IEEE rounding modes, they may be followed by target-specific rounding modes. The argument value must be less than RoundingMode::Dynamic, which now if 7.

I am hesitating if this code is useful enough, as even for constant argument its validity cannot be verified due to non-IEEE rounding modes. Probably we should remove this check.

sepavloff updated this revision to Diff 269541.Jun 9 2020, 7:45 AM

Rebased patch

I'm wondering if this should be more opaque, and broader for the entire FP environment (not just the rounding mode). For AMDGPU we have a number of additional bits in the FP environment. We also have the denormal mode, enabling FP exceptions, and a few more exotic target specific FP mode bits.

Instrinsics get.fpmode and set.fpmode introduced in D82525 may be used for this case.

sepavloff updated this revision to Diff 276313.Jul 7 2020, 10:56 PM

Rebased patch

Rebased patch

sepavloff updated this revision to Diff 285114.Aug 12 2020, 9:30 AM

Rebased patch

sepavloff updated this revision to Diff 294927.Sep 29 2020, 4:16 AM

Updated patch

sepavloff updated this revision to Diff 295606.Oct 1 2020, 9:52 AM

Removed clang-tidy warning

Any feedback is appreciated.

RKSimon added inline comments.Oct 5 2020, 11:37 AM
llvm/lib/IR/Verifier.cpp
4973

I'm OK with this being dropped - @craig.topper @kpn ?

craig.topper added inline comments.Oct 5 2020, 2:25 PM
llvm/lib/IR/Verifier.cpp
4973

I'm fine removing it

sepavloff updated this revision to Diff 296365.Oct 5 2020, 11:31 PM

Removed check from Verifier

RKSimon added inline comments.Oct 23 2020, 8:52 AM
llvm/include/llvm/CodeGen/ISDOpcodes.h
708

Sorry for the bikeshedding - but if SET_ROUNDING is supposed to match FLT_ROUNDS - shouldn't it have a more similar name?

sepavloff added inline comments.Oct 26 2020, 8:42 AM
llvm/include/llvm/CodeGen/ISDOpcodes.h
708

It is FLT_ROUNDS_ that has "wrong" name. It is named after the macro FLT_ROUNDS, which is defined by C99. To get better names FLT_ROUNDS_ must be renamed not SET_ROUNDING.

ok, I've no more questions @arsenm @kpn?

llvm/include/llvm/CodeGen/ISDOpcodes.h
708

OK - add a TODO comment by FLT_ROUNDS_ then?

kpn added a comment.Oct 26 2020, 11:30 AM

ok, I've no more questions @arsenm @kpn?

Nothing from me.

Added TODO.

sepavloff added inline comments.Oct 26 2020, 11:17 PM
llvm/include/llvm/CodeGen/ISDOpcodes.h
708

Done.

craig.topper added inline comments.Oct 27 2020, 7:45 PM
llvm/include/llvm/IR/IRBuilder.h
897

Do we need this? I don't think we provide IRBuilder for all intrinsics. Just common ones.

Do we need type legalization support for targets like RISCV where i32 isn't a legal type in SelectionDAG?

sepavloff updated this revision to Diff 301210.Oct 28 2020, 3:02 AM

Removed method IRBuilderBase::createSetRounding.

Do we need type legalization support for targets like RISCV where i32 isn't a legal type in SelectionDAG?

Lowering of the intrinsic is anyway custom. Default promotion to i64 is OK.

Do we need type legalization support for targets like RISCV where i32 isn't a legal type in SelectionDAG?

Lowering of the intrinsic is anyway custom. Default promotion to i64 is OK.

There is no default promotion code. Each opcode that needs to be promoted by type legalization must be handled in LegalizeIntegerTypes.cpp

sepavloff updated this revision to Diff 303062.Nov 5 2020, 1:57 AM

Add support in DAGTypeLegalizer::PromoteIntegerOperand

Do we need type legalization support for targets like RISCV where i32 isn't a legal type in SelectionDAG?

Lowering of the intrinsic is anyway custom. Default promotion to i64 is OK.

There is no default promotion code. Each opcode that needs to be promoted by type legalization must be handled in LegalizeIntegerTypes.cpp

Handling of SET_ROUNDING on RISCV is implemented in D91242.

LGTM - any other comments? @craig.topper is the RISCV followup at D91242 OK do you think? It'll probably be what other targets end up using as reference.

This revision was not accepted when it landed; it landed in state Needs Review.Sun, Jan 31, 8:29 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.