Page MenuHomePhabricator

Added intrinsics for access to FP environment
Needs ReviewPublic

Authored by sepavloff on Dec 19 2019, 10:45 PM.

Details

Summary

The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They do the same operations as C library functions 'fegetenv' and 'fesetenv'.
By default these intrinsics are lowered to calls to these library functions.

Diff Detail

Event Timeline

sepavloff created this revision.Dec 19 2019, 10:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2019, 10:45 PM
kpn added a comment.Dec 20 2019, 12:09 PM

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

It is expected that targets will implement custom lowering to proper machine instructions for better performance.

Do you have any performance numbers?

In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

It is expected that targets will implement custom lowering to proper machine instructions for better performance.

Do you have any performance numbers?

I don't have them. But there are some considerations why optimization may be desirable:

  • For some targets FP environment is a content of 1 or 2 registers. It this case custom lowering can store the environment in registers and avoid expenses for function calls and memory operations.
  • Some targets encode FP environment in instructions. In this case these intrinsics shoul not produce any code.
  • Probably we need to reset FP environment before calls to external functions in strictfp function and restore it upon return. In this case number of calls to such intrinsics may be large enough.
kpn added a comment.Dec 23 2019, 6:04 AM
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

In D71742#1794901, @kpn wrote:
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.

In D71742#1794901, @kpn wrote:
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.

I don’t see how inlining is any different than f2 calling f1 without it being inlined. f1 would execute with the f2 environment in that case too.

arsenm added a subscriber: arsenm.Dec 23 2019, 9:01 AM
arsenm added inline comments.
llvm/docs/LangRef.rst
19167

Why a pointer? Why not an i64 argument? The lowering of this through memory seems problematic for any target that doesn't implement this as the fe libcalls

In D71742#1794901, @kpn wrote:
In D71742#1793243, @kpn wrote:

I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.

This is about inlining. In the code like this:

double f1(double x, double y) {
  return x + y;
}
double f2(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  ...
  return f1(x, y);
}

compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.

Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.

When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.

I don’t see how inlining is any different than f2 calling f1 without it being inlined. f1 would execute with the f2 environment in that case too.

Exactly. It is the responsibility of the programmer to ensure f1 is being called with the correct environment set. C11 7.6.1.2 explicitly says:

If part of a program [...] runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.

The only issue for the compiler w.r.t. inlining is the case where the programmer writes correct code in f2, i.e. resets the environment to the default state before calling f1, but then f1 is inlined into f2. In this scenario, the compiler must ensure to preserve correctness by not scheduling any part of the inlined f1 to before the instruction in f2 that resets the environment. This can be achieved e.g. by the inliner replacing all regular FP instructions with constrained intrinsics (which may specify the default mode).

Updated patch

  • Rebased,
  • Used more consistent naming.
sepavloff edited the summary of this revision. (Show Details)Jun 12 2020, 10:43 AM
jdoerfert added inline comments.Jun 13 2020, 9:15 AM
llvm/include/llvm/IR/Intrinsics.td
671

[Drive by][Unrelated] nosync, nofree missing. Should be readonly or writeonly I suspect. Arguments can be nocapture.

arsenm requested changes to this revision.Jun 13 2020, 9:57 AM
arsenm added inline comments.
llvm/include/llvm/IR/Intrinsics.td
668–670

Using a pointer for this is problematic, and one with a hardcoded 0 address space doubly so

This revision now requires changes to proceed.Jun 13 2020, 9:57 AM
sepavloff updated this revision to Diff 270683.Jun 15 2020, 2:01 AM

Updated patch

  • fixed some formatting issues,
  • added missed attrubutes to the new intrinsics,
  • pointer arguments support address spaces.
sepavloff marked 3 inline comments as done.Jun 15 2020, 2:04 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
668–670

Changed type to llvm_anyptr_ty, which must support arbitrary address spaces.

671

Fixed, thank you.

arsenm added inline comments.Jun 15 2020, 5:08 AM
llvm/include/llvm/IR/Intrinsics.td
668–670

Accepting any address space is only a half-fix. Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend? This wouldn't require lowering to introduce stack usage for example for something that's usually read directly out of a register

sepavloff marked 2 inline comments as done.Jun 15 2020, 9:52 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
668–670

Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend?

In this case X86 represents FP environment as i256. It is not clear how to legalize this scalar type. Passing FPEnv through memory allows to avoid issues in legalization. For targets that get/set environment by simple register moves the intermediate stack slot is eliminated (see D81843). Well, almost eliminated, only write to stack slot remains, but it is DCE deficiency.

sepavloff updated this revision to Diff 272363.Jun 22 2020, 2:52 AM

Removed dependency on DataLayout

For the task of the current patch availability of FPEnv size in
DataLayout is not strictly necessary.

arsenm added inline comments.Jun 23 2020, 8:50 AM
llvm/include/llvm/IR/Intrinsics.td
668–670

I'm not sure what you mean. You could legalize i256 by storing to the stack and reloading parts for all operations? This probably already works.

sepavloff marked an inline comment as done.Jun 26 2020, 4:09 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
668–670

I tried to implement a solution in which get.fpenv returns integer value. The solution seems to work, however it has its own drawbacks:

  • In the case of x86 get.fpenv returns i256. This is illegal type, so we cannot set custom lowering through setOperationAction. TLI.getOperationAction always returns Expand as the type is extended. To cope with this difficulty, DAGTypeLegalizer::ExpandIntegerResult has to call TLI.ReplaceNodeResults directly.
  • It is necessary to use special node class to represent GET_FPENV, otherwise TLI.ReplaceNodeResults has no access to the chain token argument.
  • The code created for the source:
	define void @func_01(i256* %fpenv) {
	entry:
	  %fpe = call i256 @llvm.get.fpenv.i256()
	  store i256 %fpe, i256* %fpenv
	  ret void
	}

uses stack variable, although it should use pointer argument directly. Optimization that would eliminate it is now absent.

This way is possible, but it requires more efforts to implement. There must be serious reasons why we prefer this way over using pointer argument.

Could you please explain the concern of using pointer in the intrinsic? What is wrong with the implementation used in D81843? Why it cannot be used by targets where FP environment is simply a content of a register?

arsenm added inline comments.Jul 17 2020, 12:55 PM
llvm/include/llvm/IR/Intrinsics.td
668–670

It could be, but it forces the lowering to introduce stack usage. Now an alloca has to stick around for the scratch pad until codegen, and SROA can't eliminate it. For AMDGPU for example, we change ABI lowering and other optimization strategies based on whether there is stack usage in the incoming IR. We don't really have other intrinsics that force you into this situation

I also noticed we have llvm.flt.rounds already, so I'm confused why this is different

sepavloff marked an inline comment as done.Jul 20 2020, 5:00 AM
sepavloff added inline comments.
llvm/include/llvm/IR/Intrinsics.td
668–670

Thank you for the explanation. I'll try to elaborate the other way.

I also noticed we have llvm.flt.rounds already, so I'm confused why this is different

llvm.flt.rounds only reads the current rounding mode. These intrinsics save/restore the entire FP environment.

jdoerfert resigned from this revision.Jul 20 2020, 6:38 AM

Intrinsics now does not use pointers