Page MenuHomePhabricator

[FPEnv] Intrinsics for access to FP control modes
Needs ReviewPublic

Authored by sepavloff on Jun 25 2020, 1:30 AM.

Details

Summary

The change introduces intrinsics 'get_fpmode', 'set_fpmode' and
'reset_fpmode'. They manage all target dynamic floating-point control
modes, which include, for instance, rounding direction, precision,
treatment of denormals and so on. The intrinsics do the same
operations as the C library functions 'fegetmode' and 'fesetmode'. By
default they are lowered to calls to these functions.

Two main use cases are supported by this implementation.

  1. Local modification of the control modes. In this case the code

usually has a pattern (in pseudocode):

saved_modes = get_fpmode()
set_fpmode(<new_modes>)
...
<do operations with new modes>
...
set_fpmode(saved_modes)

In the case when it is known that the current FP environment is default,
the code may be shorter:

set_fpmode(<new_modes>)
...
<do operations with new modes>
...
reset_fpmode()

Such patterns appear not only in user code but also in implementations
of various FP controlling pragmas. In particular, the implementation of
#pragma STDC FENV_ROUND requires similar code if the target does not
support static rounding mode.

  1. Portable control of FP modes. Usually FP control modes are set by

write to some control register. Different targets have different
layout of this register, the way the register is accessed also may be
different. Using set of target-specific definitions for the control
register bits together with these intrinsic functions provides enough
portable way to handle control modes across wide range of hardware.

This change defines only llvm intrinsic function, which implement the
access required for the aforementioned use cases.

Diff Detail

Event Timeline

sepavloff created this revision.Jun 25 2020, 1:30 AM
Herald added a project: Restricted Project. · View Herald Transcript
sepavloff updated this revision to Diff 275055.Jul 2 2020, 4:06 AM

Fixed legalization of SET_FPMODE

sepavloff updated this revision to Diff 275373.Jul 3 2020, 5:25 AM

Missed change

sepavloff updated this revision to Diff 281827.Jul 30 2020, 1:37 AM

Rebased patch

sepavloff updated this revision to Diff 285081.Aug 12 2020, 7:30 AM

Rebased patch

sepavloff updated this revision to Diff 289188.Sep 1 2020, 8:49 AM

Rebased patch

sepavloff updated this revision to Diff 289360.Sep 2 2020, 12:15 AM

Get rid of clang-tidy warnings

sepavloff updated this revision to Diff 327751.Mar 3 2021, 5:04 AM

Rebased and simplified a bit.

sepavloff edited the summary of this revision. (Show Details)Mar 3 2021, 5:05 AM
sepavloff edited the summary of this revision. (Show Details)

From langref it isn't obvious if the following transform is valid or not

%z = fadd_strict %x, %y
call @llvm.set.fpmode.i16(i16 %fpenv)
  =>
call @llvm.set.fpmode.i16(i16 %fpenv)
%z = fadd_strict %x, %y
craig.topper added inline comments.
llvm/test/CodeGen/Generic/fpenv.ll
36

Is this missing the instructions that copy %fpenv into the stack temporary?

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

sepavloff updated this revision to Diff 328037.Mar 3 2021, 11:28 PM
sepavloff edited the summary of this revision. (Show Details)

Extended documentation, fixed chain treatment.

From langref it isn't obvious if the following transform is valid or not

%z = fadd_strict %x, %y
call @llvm.set.fpmode.i16(i16 %fpenv)
  =>
call @llvm.set.fpmode.i16(i16 %fpenv)
%z = fadd_strict %x, %y

Short mention about function ordering is added to the paragraph "Floating Point Environment Manipulation intrinsics".

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

llvm/test/CodeGen/Generic/fpenv.ll
36

Indeed, due to incorrect chain argument supplied to the library function call, the store to stack disappeared.

Thank you for the catch!

From langref it isn't obvious if the following transform is valid or not

%z = fadd_strict %x, %y
call @llvm.set.fpmode.i16(i16 %fpenv)
  =>
call @llvm.set.fpmode.i16(i16 %fpenv)
%z = fadd_strict %x, %y

Short mention about function ordering is added to the paragraph "Floating Point Environment Manipulation intrinsics".

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

Isn't X86 using this struct which is 8 bytes?

typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved;
    unsigned int __mxcsr;
  }
femode_t;

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

Isn't X86 using this struct which is 8 bytes?

typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved;
    unsigned int __mxcsr;
  }
femode_t;

Sure. I forget to mention x86.

There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.

Isn't X86 using this struct which is 8 bytes?

typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved;
    unsigned int __mxcsr;
  }
femode_t;

Sure. I forget to mention x86.

There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.

Is unsigned long 32-bits in this case?

Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?

There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.

Is unsigned long 32-bits in this case?

Yes, ARM gcc 10.2(linux) generates 4 for sizeof(unsigned long).

qiucf added a subscriber: qiucf.Mar 15 2021, 1:27 AM
sepavloff updated this revision to Diff 331293.Mar 17 2021, 9:38 AM

Added helper functions

These are methods of IRBuilder: createGetFPMode, which get size of fp modes from
DataLayout, createSetFPMode and createResetFPMode.

sepavloff updated this revision to Diff 334095.Mar 30 2021, 3:15 AM

Rebased patch

Any feedback?

sepavloff edited the summary of this revision. (Show Details)Nov 25 2021, 8:50 AM
sepavloff updated this revision to Diff 389817.Nov 25 2021, 8:53 AM
sepavloff edited the summary of this revision. (Show Details)

Updated patch

  • Rebased.
  • Get rid of using DataLayout to determine the size of control modes. It limits the usage of the intrinsics to some extent, because an IR transformation that would create a call to llvm.get.fpmode or llvm.set.fpmode must somehow know the size for current target. But for the main use cases it should be enough, only TargetInfo needs to be extended so that clang could know the size.
  • The test that checks default lowering was rewritten using soft-float option.