This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Intrinsics for access to FP control modes
ClosedPublic

Authored by sepavloff on Jun 25 2020, 1:30 AM.

Details

Summary

The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.

Two main use cases are supported by this implementation.

  1. Local modification of the control modes. In this case the code

usually has a pattern (in pseudocode):

saved_modes = get_fpmode()
set_fpmode(<new_modes>)
...
<do operations with new modes>
...
set_fpmode(saved_modes)

In the case when it is known that the current FP environment is default,
the code may be shorter:

set_fpmode(<new_modes>)
...
<do operations with new modes>
...
reset_fpmode()

Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
#pragma STDC FENV_ROUND requires similar code if the target does not
support static rounding mode.

  1. Portable control of FP modes. Usually FP control modes are set by

write to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.

This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.

Diff Detail

Event Timeline

sepavloff created this revision.Jun 25 2020, 1:30 AM
Herald added a project: Restricted Project. · View Herald Transcript
sepavloff updated this revision to Diff 275055.Jul 2 2020, 4:06 AM

Fixed legalization of SET_FPMODE

sepavloff updated this revision to Diff 275373.Jul 3 2020, 5:25 AM

Missed change

sepavloff updated this revision to Diff 281827.Jul 30 2020, 1:37 AM

Rebased patch

sepavloff updated this revision to Diff 285081.Aug 12 2020, 7:30 AM

Rebased patch

sepavloff updated this revision to Diff 289188.Sep 1 2020, 8:49 AM

Rebased patch

sepavloff updated this revision to Diff 289360.Sep 2 2020, 12:15 AM

Get rid of clang-tidy warnings

sepavloff updated this revision to Diff 327751.Mar 3 2021, 5:04 AM

Rebased and simplified a bit.

sepavloff edited the summary of this revision. (Show Details)Mar 3 2021, 5:05 AM
sepavloff edited the summary of this revision. (Show Details)

From langref it isn't obvious if the following transform is valid or not

%z = fadd_strict %x, %y
call @llvm.set.fpmode.i16(i16 %fpenv)
  =>
call @llvm.set.fpmode.i16(i16 %fpenv)
%z = fadd_strict %x, %y
craig.topper added inline comments.
llvm/test/CodeGen/Generic/fpenv.ll
36

Is this missing the instructions that copy %fpenv into the stack temporary?

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

sepavloff updated this revision to Diff 328037.Mar 3 2021, 11:28 PM
sepavloff edited the summary of this revision. (Show Details)

Extended documentation, fixed chain treatment.

From langref it isn't obvious if the following transform is valid or not

%z = fadd_strict %x, %y
call @llvm.set.fpmode.i16(i16 %fpenv)
  =>
call @llvm.set.fpmode.i16(i16 %fpenv)
%z = fadd_strict %x, %y

Short mention about function ordering is added to the paragraph "Floating Point Environment Manipulation intrinsics".

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

llvm/test/CodeGen/Generic/fpenv.ll
36

Indeed, due to incorrect chain argument supplied to the library function call, the store to stack disappeared.

Thank you for the catch!

From langref it isn't obvious if the following transform is valid or not

%z = fadd_strict %x, %y
call @llvm.set.fpmode.i16(i16 %fpenv)
  =>
call @llvm.set.fpmode.i16(i16 %fpenv)
%z = fadd_strict %x, %y

Short mention about function ordering is added to the paragraph "Floating Point Environment Manipulation intrinsics".

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

Isn't X86 using this struct which is 8 bytes?

typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved;
    unsigned int __mxcsr;
  }
femode_t;

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

Isn't X86 using this struct which is 8 bytes?

typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved;
    unsigned int __mxcsr;
  }
femode_t;

Sure. I forget to mention x86.

There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

Isn't X86 using this struct which is 8 bytes?

typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved;
    unsigned int __mxcsr;
  }
femode_t;

Sure. I forget to mention x86.

There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.

Is unsigned long 32-bits in this case?

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.

Is unsigned long 32-bits in this case?

Yes, ARM gcc 10.2(linux) generates 4 for sizeof(unsigned long).

qiucf added a subscriber: qiucf.Mar 15 2021, 1:27 AM
sepavloff updated this revision to Diff 331293.Mar 17 2021, 9:38 AM

Added helper functions

These are methods of IRBuilder: createGetFPMode, which get size of fp modes from
DataLayout, createSetFPMode and createResetFPMode.

sepavloff updated this revision to Diff 334095.Mar 30 2021, 3:15 AM

Rebased patch

Any feedback?

sepavloff edited the summary of this revision. (Show Details)Nov 25 2021, 8:50 AM
sepavloff updated this revision to Diff 389817.Nov 25 2021, 8:53 AM
sepavloff edited the summary of this revision. (Show Details)

Updated patch

  • Rebased.
  • Get rid of using DataLayout to determine the size of control modes. It limits the usage of the intrinsics to some extent, because an IR transformation that would create a call to llvm.get.fpmode or llvm.set.fpmode must somehow know the size for current target. But for the main use cases it should be enough, only TargetInfo needs to be extended so that clang could know the size.
  • The test that checks default lowering was rewritten using soft-float option.
arsenm requested changes to this revision.Jun 15 2023, 9:36 AM

Can this be abandoned with get_fpenv / set_fpenv already pushed?

This revision now requires changes to proceed.Jun 15 2023, 9:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 9:36 AM
sepavloff updated this revision to Diff 538506.Jul 9 2023, 10:20 PM

Rebased patch

Can this be abandoned with get_fpenv / set_fpenv already pushed?

These are different intrinsics, and they correspond to different C functions. Control modes are a part of FP environment, but the latter contains also status bits. If a target has single FP status and control register, both fegetenv and fegetmode are implemented as reading from that register in glibc, but in fegetmode bits not related to control modes are cleared. If a target have different status and control registers, or FP environment is large in size (as on X86), get_fpmode/set_fpmode can be have faster implementation than get_fpenv/set_fpenv.

I just realized how much I had confused this with llvm.get.fpenv. I think I implemented this one as llvm.get.fpenv in D152710. Can you clarify in the LangRef for get/set fpenv what the "current floating-point environment" means? I was interpreting this as the mode register, not including the dynamic exception mask

Could you also handle GlobalISel?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4430–4432

You don't need to bother with this typed pointer dance anymore

arsenm requested changes to this revision.Aug 17 2023, 3:45 PM
arsenm added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4406

This should be getStack, not getFixedStack

4420

This should be getStack, not getFixedStack

This revision now requires changes to proceed.Aug 17 2023, 3:45 PM
sepavloff added inline comments.Aug 18 2023, 12:00 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4420

Could you please explain a bit, why it should be getStack? It is not used anywhere in this file. Function PerformInsertVectorEltInMemory uses getFixedStack for temporary space, what is the difference between it and these cases?

Rebase, define custom SDNodes for the intrinsics and use opaque pointer

arsenm requested changes to this revision.Aug 18 2023, 5:48 AM
arsenm added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
4420

Nevermind this is correct. We have confusing overlapping terminology. MachineFrameInfo refers to fixed stack objects as frame indexes as known fixed offsets for incoming stack arguments but apparently MachineMemOperand uses getFixedStack for any FrameIndex reference and getStack for offsets from SP

4430–4432

Don't think the alloca address space is correct for this. I guess use the default? Also just use TLI.getPointerTy, you don't need to go through the IR type

This revision now requires changes to proceed.Aug 18 2023, 5:48 AM

Address reviewer's notes

arsenm accepted this revision.Aug 18 2023, 10:46 AM

LGTM

This revision is now accepted and ready to land.Aug 18 2023, 10:46 AM
This revision was landed with ongoing or failed builds.Aug 24 2023, 1:53 AM
This revision was automatically updated to reflect the committed changes.