This is an archive of the discontinued LLVM Phabricator instance.

Added intrinsics for access to FP environment
ClosedPublic

Authored by sepavloff on Dec 19 2019, 10:45 PM.

Details

Summary

The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.

The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like fesetenv and fegetenv do. They are used when
target has long representation for floationg-point state.

Diff Detail

Event Timeline

sepavloff created this revision.Dec 19 2019, 10:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2019, 10:45 PM
kpn added a comment.Dec 20 2019, 12:09 PM

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

It is expected that targets will implement custom lowering to proper machine instructions for better performance.

Do you have any performance numbers?

In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

It is expected that targets will implement custom lowering to proper machine instructions for better performance.

Do you have any performance numbers?

I don't have them. But there are some considerations why optimization may be desirable:

  • For some targets FP environment is a content of 1 or 2 registers. It this case custom lowering can store the environment in registers and avoid expenses for function calls and memory operations.
  • Some targets encode FP environment in instructions. In this case these intrinsics shoul not produce any code.
  • Probably we need to reset FP environment before calls to external functions in strictfp function and restore it upon return. In this case number of calls to such intrinsics may be large enough.
kpn added a comment.Dec 23 2019, 6:04 AM
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

In D71742#1794901, @kpn wrote:
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.

In D71742#1794901, @kpn wrote:
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.

I don’t see how inlining is any different than f2 calling f1 without it being inlined. f1 would execute with the f2 environment in that case too.

arsenm added a subscriber: arsenm.Dec 23 2019, 9:01 AM
arsenm added inline comments.
llvm/docs/LangRef.rst
25370

Why a pointer? Why not an i64 argument? The lowering of this through memory seems problematic for any target that doesn't implement this as the fe libcalls

In D71742#1794901, @kpn wrote:
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.

I don’t see how inlining is any different than f2 calling f1 without it being inlined. f1 would execute with the f2 environment in that case too.

Exactly. It is the responsibility of the programmer to ensure f1 is being called with the correct environment set. C11 7.6.1.2 explicitly says:

If part of a program [...] runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.

The only issue for the compiler w.r.t. inlining is the case where the programmer writes correct code in f2, i.e. resets the environment to the default state before calling f1, but then f1 is inlined into f2. In this scenario, the compiler must ensure to preserve correctness by not scheduling any part of the inlined f1 to before the instruction in f2 that resets the environment. This can be achieved e.g. by the inliner replacing all regular FP instructions with constrained intrinsics (which may specify the default mode).

Updated patch

  • Rebased,
  • Used more consistent naming.
sepavloff edited the summary of this revision. (Show Details)Jun 12 2020, 10:43 AM
jdoerfert added inline comments.Jun 13 2020, 9:15 AM
llvm/include/llvm/IR/Intrinsics.td
1074

[Drive by][Unrelated] nosync, nofree missing. Should be readonly or writeonly I suspect. Arguments can be nocapture.

arsenm requested changes to this revision.Jun 13 2020, 9:57 AM
arsenm added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1070–1072

Using a pointer for this is problematic, and one with a hardcoded 0 address space doubly so

This revision now requires changes to proceed.Jun 13 2020, 9:57 AM
sepavloff updated this revision to Diff 270683.Jun 15 2020, 2:01 AM

Updated patch

  • fixed some formatting issues,
  • added missed attrubutes to the new intrinsics,
  • pointer arguments support address spaces.
sepavloff marked 3 inline comments as done.Jun 15 2020, 2:04 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1070–1072

Changed type to llvm_anyptr_ty, which must support arbitrary address spaces.

1074

Fixed, thank you.

arsenm added inline comments.Jun 15 2020, 5:08 AM
llvm/include/llvm/IR/Intrinsics.td
1070–1072

Accepting any address space is only a half-fix. Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend? This wouldn't require lowering to introduce stack usage for example for something that's usually read directly out of a register

sepavloff marked 2 inline comments as done.Jun 15 2020, 9:52 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1070–1072

Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend?

In this case X86 represents FP environment as i256. It is not clear how to legalize this scalar type. Passing FPEnv through memory allows to avoid issues in legalization. For targets that get/set environment by simple register moves the intermediate stack slot is eliminated (see D81843). Well, almost eliminated, only write to stack slot remains, but it is DCE deficiency.

sepavloff updated this revision to Diff 272363.Jun 22 2020, 2:52 AM

Removed dependency on DataLayout

For the task of the current patch availability of FPEnv size in
DataLayout is not strictly necessary.

arsenm added inline comments.Jun 23 2020, 8:50 AM
llvm/include/llvm/IR/Intrinsics.td
1070–1072

I'm not sure what you mean. You could legalize i256 by storing to the stack and reloading parts for all operations? This probably already works.

sepavloff marked an inline comment as done.Jun 26 2020, 4:09 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1070–1072

I tried to implement a solution in which get.fpenv returns integer value. The solution seems to work, however it has its own drawbacks:

  • In the case of x86 get.fpenv returns i256. This is illegal type, so we cannot set custom lowering through setOperationAction. TLI.getOperationAction always returns Expand as the type is extended. To cope with this difficulty, DAGTypeLegalizer::ExpandIntegerResult has to call TLI.ReplaceNodeResults directly.
  • It is necessary to use special node class to represent GET_FPENV, otherwise TLI.ReplaceNodeResults has no access to the chain token argument.
  • The code created for the source:
	define void @func_01(i256* %fpenv) {
	entry:
	  %fpe = call i256 @llvm.get.fpenv.i256()
	  store i256 %fpe, i256* %fpenv
	  ret void
	}

uses stack variable, although it should use pointer argument directly. Optimization that would eliminate it is now absent.

This way is possible, but it requires more efforts to implement. There must be serious reasons why we prefer this way over using pointer argument.

Could you please explain the concern of using pointer in the intrinsic? What is wrong with the implementation used in D81843? Why it cannot be used by targets where FP environment is simply a content of a register?

arsenm added inline comments.Jul 17 2020, 12:55 PM
llvm/include/llvm/IR/Intrinsics.td
1070–1072

It could be, but it forces the lowering to introduce stack usage. Now an alloca has to stick around for the scratch pad until codegen, and SROA can't eliminate it. For AMDGPU for example, we change ABI lowering and other optimization strategies based on whether there is stack usage in the incoming IR. We don't really have other intrinsics that force you into this situation

I also noticed we have llvm.flt.rounds already, so I'm confused why this is different

sepavloff marked an inline comment as done.Jul 20 2020, 5:00 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1070–1072

Thank you for the explanation. I'll try to elaborate the other way.

I also noticed we have llvm.flt.rounds already, so I'm confused why this is different

llvm.flt.rounds only reads the current rounding mode. These intrinsics save/restore the entire FP environment.

jdoerfert resigned from this revision.Jul 20 2020, 6:38 AM

Intrinsics now does not use pointers

Is this still relevant?

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 5:28 PM
sepavloff updated this revision to Diff 462193.Sep 22 2022, 9:05 AM

Updated the patch

Matt added a subscriber: Matt.Sep 27 2022, 11:41 AM
sepavloff updated this revision to Diff 521004.May 10 2023, 8:34 AM

Updated patch

  • Use special DAG nodes if FP environment is passed in memory,
  • Various cleanups.
sepavloff edited the summary of this revision. (Show Details)May 10 2023, 8:36 AM
sepavloff updated this revision to Diff 521579.May 12 2023, 1:22 AM
sepavloff edited the summary of this revision. (Show Details)

Updated patch after test started using autogenerated assertions in another commit.

sepavloff updated this revision to Diff 522994.May 17 2023, 4:15 AM

Add memoperand to GET_FPENV_MEM and SET_FPENV_MEM

arsenm added inline comments.May 19 2023, 11:09 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6551–6553

Can this just go into an Expand action like normal? We randomly expand certain things in the DAG builder but the inconsistency is annoying

arsenm added inline comments.May 19 2023, 11:11 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
12327

If not getting it from the actual declaration, should probably use the default program address space (don't really care if you fix it here, 90% of the uses of this are wrong as it is)

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6591

This should use the alignment for EnvVT, not the datalayout's stack alignment

sepavloff updated this revision to Diff 525563.May 25 2023, 5:49 AM

Updated patch

  • Use type alignment rather than stack one,
  • Use getPointerMemTy instead of getPointerTy.
sepavloff added inline comments.May 25 2023, 5:57 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
12327

IIRC default program address space is for function code. I put getPointerMemTy instead of getPointerTy, although now it behave in the same way. I don't know to put correct address space here, as declaration of the function is unavailable.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6551–6553

This code does the same job as other cases in this function, - it creates relevant DAG nodes. Their expansion occurs in SelectionDAG::Legalize, as for other nodes. The only difference is using two kings of nodes, register and memory-based ones.

Using only one kind of node could simplify code in this function but complicates code in other parts. On some targets FP environment is long and corresponding integer type is illegal (i256 on x86). In such case DAG nodes have to be expanded early, when DAG legalizes types. If FP environment can be represented by a legal type, the expansion occurs late, in SelectionDAG::Legalize. There were two points when the expansion occurred. This approach was used in previous versions of this patch, it complicated the implementation and made it fragile.

Using separate nodes for operations that involves memory allows to treat all FP environment operations in uniform manner, at the cost of small complication of this function.

6591

Fixed.

arsenm accepted this revision.Jun 2 2023, 12:57 PM

LGTM with the getPointerMemTy thing reverted

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
12327

pointer mem ty makes less sense

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6551–6553

This hack doesn't require repeating in globalisel

6561

CreateStackTemporary should be fixed to directly take Align input

This revision is now accepted and ready to land.Jun 2 2023, 12:57 PM
This revision was landed with ongoing or failed builds.Jun 4 2023, 11:11 PM
This revision was automatically updated to reflect the committed changes.