The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They do the same operations as C library functions 'fegetenv' and 'fesetenv'.
By default these intrinsics are lowered to calls to these library functions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I don't see the need. Changing the FP environment in a mixed environment program is the responsibility of the programmer, and standard calls already exist for this.
It is expected that targets will implement custom lowering to proper machine instructions for better performance.
Do you have any performance numbers?
This is about inlining. In the code like this:
double f1(double x, double y) { return x + y; } double f2(double x, double y) { #pragma STDC FENV_ACCESS ON ... return f1(x, y); }
compiler might inline call to f1 in f2. However the inlined function f1 expects default FP environment but is called in some other one.
I don't have them. But there are some considerations why optimization may be desirable:
- For some targets FP environment is a content of 1 or 2 registers. It this case custom lowering can store the environment in registers and avoid expenses for function calls and memory operations.
- Some targets encode FP environment in instructions. In this case these intrinsics shoul not produce any code.
- Probably we need to reset FP environment before calls to external functions in strictfp function and restore it upon return. In this case number of calls to such intrinsics may be large enough.
Nothing here changes my statement. The compiler does _not_ change the FP environment because of the #pragma. So f1() here would behave the same whether it was inlined or not.
When f1 is defined, no pragma is in act, so its body is executed in default FP environment. f2 contains #pragma, so FP environment in its body may differ from the default. When f1 is inlined into f2, the body of f1 becomes a part of the body of f2. Basic blocks of f1 would be executed in the environment, set in f2. To keep semantics of f1 we must execute its BBs in the default environment.
I don’t see how inlining is any different than f2 calling f1 without it being inlined. f1 would execute with the f2 environment in that case too.
llvm/docs/LangRef.rst | ||
---|---|---|
19167 | Why a pointer? Why not an i64 argument? The lowering of this through memory seems problematic for any target that doesn't implement this as the fe libcalls |
Exactly. It is the responsibility of the programmer to ensure f1 is being called with the correct environment set. C11 7.6.1.2 explicitly says:
If part of a program [...] runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.
The only issue for the compiler w.r.t. inlining is the case where the programmer writes correct code in f2, i.e. resets the environment to the default state before calling f1, but then f1 is inlined into f2. In this scenario, the compiler must ensure to preserve correctness by not scheduling any part of the inlined f1 to before the instruction in f2 that resets the environment. This can be achieved e.g. by the inliner replacing all regular FP instructions with constrained intrinsics (which may specify the default mode).
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
671 | [Drive by][Unrelated] nosync, nofree missing. Should be readonly or writeonly I suspect. Arguments can be nocapture. |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 | Using a pointer for this is problematic, and one with a hardcoded 0 address space doubly so |
Updated patch
- fixed some formatting issues,
- added missed attrubutes to the new intrinsics,
- pointer arguments support address spaces.
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 | Accepting any address space is only a half-fix. Why not make this return llvm_anyint_ty, and define it as zext or truncated to the expected target size in the backend? This wouldn't require lowering to introduce stack usage for example for something that's usually read directly out of a register |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 |
In this case X86 represents FP environment as i256. It is not clear how to legalize this scalar type. Passing FPEnv through memory allows to avoid issues in legalization. For targets that get/set environment by simple register moves the intermediate stack slot is eliminated (see D81843). Well, almost eliminated, only write to stack slot remains, but it is DCE deficiency. |
Removed dependency on DataLayout
For the task of the current patch availability of FPEnv size in
DataLayout is not strictly necessary.
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 | I'm not sure what you mean. You could legalize i256 by storing to the stack and reloading parts for all operations? This probably already works. |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 | I tried to implement a solution in which get.fpenv returns integer value. The solution seems to work, however it has its own drawbacks:
define void @func_01(i256* %fpenv) { entry: %fpe = call i256 @llvm.get.fpenv.i256() store i256 %fpe, i256* %fpenv ret void } uses stack variable, although it should use pointer argument directly. Optimization that would eliminate it is now absent. This way is possible, but it requires more efforts to implement. There must be serious reasons why we prefer this way over using pointer argument. Could you please explain the concern of using pointer in the intrinsic? What is wrong with the implementation used in D81843? Why it cannot be used by targets where FP environment is simply a content of a register? |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 | It could be, but it forces the lowering to introduce stack usage. Now an alloca has to stick around for the scratch pad until codegen, and SROA can't eliminate it. For AMDGPU for example, we change ABI lowering and other optimization strategies based on whether there is stack usage in the incoming IR. We don't really have other intrinsics that force you into this situation I also noticed we have llvm.flt.rounds already, so I'm confused why this is different |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
668–670 | Thank you for the explanation. I'll try to elaborate the other way.
llvm.flt.rounds only reads the current rounding mode. These intrinsics save/restore the entire FP environment. |